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Abstract: The multiscale retinex with color restoration (MSRCR) has shown itself to be a 
very versatile automatic image enhancement algorithm that simultaneously provides dynamic 
range compression, color constancy, and color rendition. A number of algorithms exist that 
provide one or more of these features, but not all. In this paper we compare the performance 
of the MSRCR with techniques that are widely used for image enhancement. Specifically, 
we compare the MSRCR with color adjustment methods such as gamma correction and 
gain/offset application, histogram modification techniques such as histogram equalization 
and manual histogram adjustment, and other more powerful techniques such as homomorphic 
filtering and 'burning and dodging’. The comparison is carried out by testing the suite of 
image enhancement methods on a set of diverse images. We find that though some of these 
techniques work well for some of these images, only the MSRCR performs universally well on 
the test set. 

3. D. J. Jobson, Z. Rahman, and G. A. Woodell, “Retinex Image Processing: Improved Fi- 
delity To Direct Visual Observation,” IS&T Fourth Color Imaging conference: Color Science, 
Systems, and Applications, Scottsdale, AZ, (November 1996). 

Abstract: Recorded color images differ from direct human viewing by the lack of dynamic 
range compression and color constancy. Research is summarized which develops the cen- 
ter/surround retinex concept originated by Edwin Land through a single-scale design to a 
multi-scale design with color restoration (MSRCR). The MSRCR synthesizes dynamic range 
compression, color constancy, and color rendition, and, thereby, approaches fidelity to direct 
observation. 

4. Z. Rahman, D. J. Jobson, and G. A. Woodell, “Multiscale Retinex for Color Image Enhance- 
ment,” in Proceedings of the IEEE International Conference on Image Processing, Lausanne, 
Switzerland. (September 1996). 

Abstract: The retinex is a human perception-based image processing algorithm which pro- 
vides color constancy and dynamic range compression. We have previously reported on a 
single-scale retinex (SSR) and shown that it can either achieve color/lightness rendition or 
dynamic range compression, but not both simultaneously. We now present a multi-scale 
retinex (MSR) which overcomes this limitation for most scenes. Both color rendition and dy- 
namic range compression are successfully accomplished except for some “pathological” scenes 
that have very strong spectral characteristics in a single band. 

5. F. O. Huck, C. L. Fales, and Z. Rahman, “On the Information-Theoretic Assessment of Visual 
Communication,” in Proceedings of the IEEE International Conference on Image Processing, 
(September 1996). 

Abstract: This paper deals with the extensions of information theory to the assessment of vi- 
sual communication from scene to observer. The mathematical development rigorously unites 
the electro-optical design of image gathering and display devices with the digital processing 
algorithms for image coding and restoration. Results show: 
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This is the final report for NASA grant NAG-1-1847 and covers the period from 01 July, 1996 to 
30 September, 1997. Enclosed are copies of the conference papers and journal articles for which 
the research was at least partially supported by the grant. 

The research during this period was in two primary, related areas. The first was the evaluation 
of integrated information adaptive image compression. And, the second was the development of 
a new technique for image enhancement. The NASA Langley Research Center and Science and 
Technology Corporation (former employers of Z. Rahman) jointly applied for patent on the latter 
technique in May 1996. The patent was subsequently approved in April 1998, and is awaiting 
publication. 

During the cited time period we continued work on those papers that had been started before 
the commencement of the grant. Some of these were already in the accepted-for-publication stage 
but required revisions. Other papers were written and presented during the grant period. A list 
of the publications is presented below along with the abstracts. A copy of each of the puplications 
is enclosed except for the book Visual Communication: An Information Theory Approach and the 
two papers that appeared in the October 1996 issue of the Philosophical Transactions of the Royal 
Society . These two papers were awarded the H. J. E. Reid Award for the Outstanding Paper in 
June 1998. 


1 Publications 

Publications are listed chronologically in each category. 

1.1 Conferences 

1. Z. Rahman, F. O. Buck, and C. L. Fales, “Informationally Optimized Image-gathering and 
Restoration,” presented at the IS&T’s 50th Annual Conference, Cambridge, MA, (May 1997). 

Abstract: The goal of image gathering and restoration often is to produce the best possible 
picture in terms of fidelity, sharpness and clarity. However, this goal cannot be attained, at 
it has been pursued in the past, by treating image gathering and restoration as independent 
tasks. Instead, in a clean departure from the mores of traditional image processing, we present 
an approach that rigorously uses modern communication theory to optimally combine the 
electro-optical design of the image gathering device with the digital processing algorithm for 
image restoration. Extensive simulations have shown that there exists a strong correlation 
between the information rate that is produced by the image gathering device and the image 
quality with which an image can be restored. 

2. Z. Rahman, G. A. Woodell, and D. J. Jobson, “A Comparison of the Multiscale Retinex With 
Other Image Enhancement Techniques,” Proceedings of the IS&T’s 50th Annual Conference, 
Cambridge, MA, (May 1997). 
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• End-to-end system analysis closely correlates with measurable and perceptual perfor- 
mance characteristics, such as data rate and image quality, respectively. 

• The goal of producing the best possible image at the lowest possible image data rate 
can be realized only if (a) the electro-optical design of the image-gathering device is 
optimized for the maximum-realizable information rate and (b) the image-restoration 
algorithm properly accounts for the perturbations in the visual communication channel. 

6. Z. Rahman, D. J. Jobson, and G. A. Woodell, “Multiscale Retinex for Dynamic Range Com- 
pression and Color Rendition,” Applications of Digital Image Processing XIX , Andrew G. 
Tescher, Ed., Proc. SPIE 2847, (August 1996). 

Abstract: The human vision system performs the tasks of dynamic range compression and 
color constancy almost effortlessly. The same tasks pose a very challenging problem for 
imaging systems whose dynamic range is restricted by either the dynamic response of film, in 
case of analog cameras, or by the analog-to-digital converters, in the case of digital cameras. 
The images thus formed are unable to encompass the wide dynamic range present in most 
natural scenes (often > 500 : 1). Whereas the human visual system is quite tolerant to 
spectral changes in lighting conditions, these strongly affect both the film response for analog 
cameras and the filter responses for digital cameras, leading to incorrect color formulation in 
the acquired image. Our multiscale retinex, based in part on Edwin Land’s work on color 
constancy, provides a fast, simple, and automatic technique for simultaneous dynamic range 
compression and accurate color rendition. The retinex algorithm is non-linear, and global 
output at a point is also a function of its surround — in extent. A comparison with conventional 
dynamic range compression techniques such as the application of point non-linearities, e.g. 
log(x,y), and global histogram equalization and/or modification shows that the multiscale 
retinex simultaneously provides the best dynamic range compression and color rendition. The 
applications of such an algorithm are many; from medical imaging to remote sensing; and 
from commercial photography to color transmission. 

7. Z. Rahman, “Integrated wavelet compression and restoration,” Wavelet Applications in Sig- 
nal and Image Processing IV , Michael A. Unser, Akram Aldroubi, Andrew F. Laine. eds., 
Proc. SPIE 2825, (August 1996). 

Abstract: The performance of wavelet compression algorithms is generally judged solely 
as a function of the compression ratio and the visual artifacts which are perceivable in the 
reconstructed image. The problem then becomes one of obtaining the best compression 
with fewest visible artifacts — a very subjective measure. Our wavelet compression algorithm 
uses an information theoretic analysis for the design of the compression maps. We have 
previously shown that maximizing the information for a given visual communication channel 
also maximizes the visual quality of the restored image. We utilize this to design quantization 
maps which maximize information for a given compression ratio. Hence we are able to design 
quantization maps which maximize the restorability of an image — i.e. the information content, 
the image quality, and the mean-square difference fidelity— for a given compression ratio. 
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1.2 Journal Articles 


1. D. J. Jobson, Z. Rahman, and G. A. Woodell, “A Multi-Scale Retinex For Bridging the Gap 
Between Color Images and the Human Observation of Scenes/’ IEEE Transactions on Image 
Processing , Special Issue on Color Processing, (July 1997). 

Abstract: Direct observation and recorded color images of the same scenes are often strik- 
ingly different because human visual perception computes the conscious representation with 
vivid color and detail in shadows, and with resistance to spectral shifts in the scene illuminant. 
A computation for color images which approaches fidelity to scene observation jem^mustj/em/, 
combine dynamic range compression, color consistency — a computational analog for human 
vision color constancy — and color and lightness tonal rendition. In this paper, we extend a 
previously designed single scale center/surround retinex to a multi-scale version that achieves 
simultaneous dynamic range compression/color consistency /lightness rendition. This exten- 
sion fails to produce good color rendition for a class of images that contain violations of the 
gray-world assumption implicit to the theoretical foundation of the retinex. Therefore we 
define a method of color restoration that corrects for this deficiency at the cost of a modest 
dilution in color consistency. Extensive testing of the multi-scale retinex with color restoration 
on several test jem^ scenesj/em^ and over a hundred images did not reveal any pathological 
behavior. 

2. D. J. Jobson, Z. Rahman, and G. A. Woodell, “Properties and Performance of a Cen- 
ter/Surround Retinex,” IEEE Transactions on Image Processing , (March 1997). 

Abstract: The last version of Edwin Land’s retinex model for human vision’s lightness and 
color constancy has been implemented and tested in image processing experiments. Previous 
research has established the mathematical foundations of Land’s retinex but has not subjected 
his lightness theory to extensive image processing experiments. We have sought to define a 
practical implementation of the retinex without particular concern for its validity as a model 
for human lightness and color perception. Here we describe the trade-off between rendition 
and dynamic range compression that is governed by the surround space constant. Further, 
unlike previous results, we find that the placement of the logarithmic function is important 
and produces best results when placed after the surround formation. Also unlike previous 
results, we find best rendition for a “canonical” gain/offset applied after the retinex operation. 
Various functional forms for the retinex surround are evaluated and a Gaussian form found 
to perform better than the inverse square suggested by Land. Images which violate the gray 
world assumptions (implicit to the retinex) are investigated to provide insight into cases where 
the retinex fails to produce a good rendition. 

3. F. O. Huck, C. L. Fales, and Z. Rahman, “An Information Theory of Visual Communica- 
tion,” Philosophical Transactions of the Royal Society A; Physical Sciences and Engineering . 
(October 1996). 

Abstract: The fundamental problem of visual communication is that of producing the best 
possible picture at the lowest data rate. We address this problem by extending information 
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theory to the assessment of the visual communication channel as a whole, from image gath- 
ering to image display. The extension unites the two disciplines, the electro-optical design of 
image-gathering and display devices and the digital processing for image coding and restora- 
tion. The mathematical development leads to several intuitively attractive figures of merit for 
assessing the visual communication channel as a function of the critical limiting factors that 
constrain its performance. Multiresolution decomposition is included in the mathematical 
development to optimally combine the economical encoding of the transmitted signal with 
image gathering and restoration. 

Quantitative and qualitative assessments demonstrate that a visual communication channel 
ordinarily can be expected to produce the best possible picture at the lowest data rate only 
if the the image-gathering device produces the maximum-realizable information rate and the 
image-restoration-algorithm properly accounts for the critical limiting factors that constrain 
visual communication. These assessments encompass (a) the electro-optical design of the 
image-gathering device in terms of the trade-off between blurring and aliasing in the presence 
of photodetector and quantization noises, (b) the compression of data transmission by redun- 
dancy reduction, (c) the robustness of the image restoration to uncertainties in the statistical 
properties of the captured radiance filed, and (d) the enhancement of the particular features 
or, more generally, of the visual quality of the observed images. The Test visual quality' in 
this context normally implies a compromise among maximum-realizable fidelity, sharpness, 
and clarity which depends on the characteristics of the scene and the purpose of the visual 
communication (e.g. diagnosis versus entertainment). 

4. C. L. Fales, F. O. Huck, R. Alter-Gartenberg, and Z. Rahman, “Image Gathering and Dig- 
ital Restoration,’ 1 Philosophical Transactions of the Royal Society A: Physical Sciences and 
Engineering , (October 1996). 

Abstract: This paper seeks to unite two disciplines: the electro-optical design of the image 
gathering and display devices and the digital processing for image restoration. So far, these 
two disciplines have remained independent, following strictly separate traditions However, the 
best possible performance can be attained only when the digital processing algorithm accounts 
for the critical limiting factors of image gathering and display and the image-gathering device 
is designed to enhance the performance of the digital-processing algorithm. The following 
salient advantages accrue: 

(a) Spatial detail as fine as the sampling interval of the image-gathering device ordinarily 
can be restored sharply and clearly. 

(b) Even finer spatial detail than the sampling interval can be restored by combining a mul- 
tiresponse image-gathering sequence with a restoration filter that properly reassembles 
the within-passband and aliased signal components. 

(c) The visual quality produced by traditional image gathering (e.g. television camera) and 
reconstruction (e.g. cubic convolution) can be improved with a small-kernel restoration 
operator without an increase in digital processing. 
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(d) The enhancement of radiance-field transitions can be improved for dynamic range com- 
pression (to suppress shadow obscurations) and for edge detection (for computer vision). 


1.3 Book 

1. F. O. Huck, C. L. Fales, and Z. Rahman, Visual Communication: An Information Theory 
Approach , Kluwer Academic Publishers, (June 1997). 

From the Publishers’ catalog: Visual Communication: An Information Theory Approach 
presents an entirely new look at the assessment and optimization of visual communication 
channels, such as are employed for telephotography and television. The electro-optical design 
of image gathering and display devices, and the digital processing for image coding and 
restoration, have remained independent disciplines which follow distinctly separate traditions; 
yet the performance of visual communication channels cannot be optimized just by cascading 
image-gathering devices, image-coding processors, and image-restoration algorithms as the 
three obligatory, but independent, elements of a modern system. Instead, to produce 'the best 
possible picture at the lowest data rate’, it is necessary to jointly optimize image gathering, 
coding, and restoration. 

Although the mathematical development in Visual Communication : An Information Theory 
Approach is firmly rooted in familiar concepts of communication theory, it leads to formu- 
lations that are significantly different from those that are found in the traditional literature 
on either rate distortion theory or digital image processing. For example, the Wiener filter, 
which is perhaps the most common image restoration algorithm in the traditional digital 
image processing literature, fails to fully account for the constraints of image gathering and 
display. As demonstrated in the book, digitally restored images improve in sharpness and 
clarity when these constraints are properly accounted for. 

Visual Communication: An Information Theory Approach is unique in its extension of modern 
communication theory to the end-to-end assessment of visual communication, from scene to 
observer. As such, it ties together the traditional textbook literature on electro-optical design 
and digital image processing. This book serves as an invaluable reference for image processing 
and electro-optical system design professionals and may be used as a text for advanced courses 
on the subject. 
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Multiscale Retinex for Dynamic Range 
Compression and Color Rendition 


Z. Rahman, D. J. Jobson and G. A. Woodell 

Applications of Digital Image Processing XIX 
Andrew G. Tescher, Ed., Proc. 

SPIE 2847 
(August 1996) 



A Multiscale Retinex For Color Rendition and Dynamic Range Compression 

Zia-ur Raliman* 

College of William & Mary. Williamsburg, VA 23187 

Daniel J. Jobson and Glenn A. Woodell 

NASA Langley Research Center, Hampton, Virginia 23681 

Abstract 

The human vision system performs the tasks of dynamic range compression and color constancy almost 
effortlessly. The same tasks pose a very challenging problem for imaging systems whose dynamic range is restricted 
by either the dynamic response of film, in case of analog cameras, or by the analog-to-digital converters, in the 
case of digital cameras. The images thus formed are unable to encompass the wide dynamic range present in most 
natural scenes (often > 500:1). Whereas the human visual system is quite tolerant to spectral changes in lighting 
conditions, these strongly affect both the film response for analog cameras and the filter responses for digital 
cameras, leading to incorrect color formulation in the acquired image. Our multiscale retinex, based in part on 
Edwin Land’s work on color constancy, provides a fast, simple, and automatic technique for simultaneous dynamic 
range compression and accurate color rendition. The retinex algorithm is non-linear, and global — output at a 
point is also a function of its surround — in extent. A comparison with conventional dynamic range compression 
techniques such as the application of point non-linearities, e.g. log(z,y), and global histogram equalization and/or 
modification shows that the multiscale retinex simultaneously provides the best dynamic range compression and 
color rendition. The applications of such an algorithm are many; from medical imaging to remote sensing; and 
from commercial photography to color transmission. 


1. Introduction 

Human perception excels at constructing a visual representation with vivid color and detail across wide ranging 
photometric levels caused by lighting variations. In addition human vision computes color so as to be relatively 
independent of spectral variations in illumination . 1 The images obtained with film and electronic cameras suffer, 
by comparison, from a loss in clarity of detail and color as light levels drop within shadows, or as distance from 
a lighting source increases. When the dynamic range of a scene exceeds the camera's dynamic range, there can 
be irrevocable loss of visual information at both extremes of the scene dynamic range. Improved fidelity of color 
images to human observation should, therefore, combine dynamic range compression, color constancy, and color and 
lightness rendition. In this paper we present our initial work in developing a technique, the multiscale retinex with 
color restoration (MSRCR), which achieves all these goals. 

The idea of the retinex was conceived by Edwin Land 2, 3 ’ 1 as a model of the lightness and color perception of 
human vision. Subsequently Hurlbert 5 ’ 6 , and Hurlbert and Poggio 7 studied the properties of the center/surround 
form of the retinex and other lightness theories and found a common mathematical foundation which possesses 
some excellent properties but cannot actually compute reflectance for arbitrary scenes. Certain scenes violate the 
“gray- world” assumption which requires that the average reflectances in the surround be equal in the three spectral 
color bands. For example, scenes that are dominated by one color — “monochromes” — clearly violate this assumption 
and are forced to gray (equal values in all spectral channels) by the retinex computation. Hurlbert 0 further showed 
the lightness problem has a solution that has a center/surround spatial form. This suggests the possibility that 
the spatial opponency of the center/surround is a general solution to estimating relative reflectances for arbitrary 
lighting conditions. At the same time it is equally clear that human vision does not determine relative reflectance 
but rather a context dependent relative reflectance since surfaces in shadow do not appear to be the same lightness 
as the same surface when lit. Moore et al. 8 ’ 9 took up the retinex problem as a natural implementation for analog 

t Funded by NASA Langley Research Center Contract #NASl-19603 to Science and Technology Corporation, and by Grant #NAGl- 
1847 to the College of William & Mary. 



VLSI resistive networks and found that color rendition was dependent on scene content— some scenes worked well, 
others did not . These studies also pointed out the problems that occur with color Mach bands and the graying out 
of large uniform zones of color. 

The MSRCR builds on the single scale retinex 10 (SSR), and the multiscale retinex 11 (MSR). Both the SSR and 
the MSR provide very good dynamic range compression but suffer from the graying out which occurs in large areas 
of uniform color. Hence the overall color/lightness rendition can be poor depending upon the scene. The MSRCR 
alleviates this problem by using a color restoration function which controls the amount of color saturation for the 
final rendition. This function provides the color restoration that is needed with the dynamic range compression to 
approximate the performance of human vision with a computation that is quite automatic and reasonably simple. 
The MSRCR is extremely useful for enhancing 8-bit color images that suffer from lighting deficiencies commonly 
encountered in architectural interiors and exteriors, landscapes, and non-studio portraiture applications. Potential 
benefits for remote sensing applications are improved visibility of color and detail in shadows and low reflectance 
zones and the diminution of sun angle/atmospheric signal variations that can lead to more resilient and accurate 
multispectral classification. 


2. Multiscale Center/Surround Retinex 

The SSR 10, 12, 13 is given by 


Ri(x,y) = log Ii(x,y) - log [F(x,y) * /,■(*, y)] (1) 

where Rj(x,y) is the retinex output, Ii(x } y) is the image distribution in the ith color spectral band. denotes the 
convolution operation, and is the surround function, 

T{x,y) = Ke- [T ^ y2)ic \ 


where c is the Gaussian surround space constant, or the scale, and K is selected such that 



T(x, y) dx dy — 1 . 


The MSR output is simply the weighted sum of the outputs of several SSRs with different scales. Mathematically, 


N 

R\l,{x,y ) = .,(*,»/), 

n = 1 


( 2 ) 


where N is the number of scales, R nx { x * V) ls die ith component of the nth scale, R\i x (x, y) is the ith color component 
of the MSR output, and u;„ is the weight associated with the nth scale. The number of scales is application dependent. 
However, after experimenting with one small scale and one large scale, the need for a third intermediate scale was 
immediately apparent in order to produce a graceful rendition without visible “halo” artifacts near strong edges. 
Experimentation shows that assigning equal weights to the scales is adequate for most applications, although a 
particular scale could be weighted more heavily if a particular feature needs to be enhanced. For instance, weighting 
the smallest scale heavily can be used to achieve the strongest dynamic range compression but leads to ungraceful 
edge artifacts and some graying of uniform color zones in the rendition. 

To test whether the dynamic range compression of the MSR approaches that of human vision we use test SCENES 
not just test images, to facilitate the comparison between the processed image and direct observation. An example 
(Fig. 1) illustrates the complementary strengths and weaknesses of each scale taken separately and the strength of 
the multiscale synthesis. This image is representative of a number of test scenes (Fig. 2) where for conciseness we 
show only the multiscale result. The comparison of the unprocessed images to the perception of the scene produces 
some striking and unexpected results. Compared to recorded images, the color and detail are far more vivid for 




Figure 1: The components of the multiscale retinex which show their complementary information content. The 
smallest scale is strong on detail and dynamic range compression and weak on tonal and color rendition. The reverse 
is true for the largest spatial scale. The multiscale retinex combines the strengths of each scale and mitigates the 
weaknesses of each. 


direct observation not only in shadowed regions, but also in the bright zones of the scene. This suggests that human 
vision is perhaps doing more than just strong dynamic range compression and that enhancements beyond the MSR 
may be needed to capture the realism of direct viewing. 

A sample of image data for surfaces in both sun and shadow indicates a dynamic range compression of 2 : 1 for 
the MSR compared to the 3 :1 to 5 : 1 measured in our perceptual tests. For the SSR this value is 1.5 : 1 or less. 
These levels of dynamic range compression are for outdoor scenes which have shadows of large spatial extent. The 
much higher values of compression that occur for the human visual perception of mixed indoor/outdoor scenes are 
compared to retinex performance in Fig. 2 (right). The foreground orange book on the grayscale is compressed by 
approximately 5 : 1 for the MSR while compression for the SSR is only approximately 3 : 1 both relative to the bright 
building facade in the background. For this case, the compression of human vision is difficult to estimate since both 
the color and texture of the two surfaces are quite different. Our impression is that the MSR is approaching human 
vision's performance but not quite reaching it. 

The MSR performs well in terms of dynamic range compression but its performance on the pathological classes of 



Original 





Figure 2: Examples of test SCENES processed with the multiscale retinex prior to color restoration. While color 
rendition of the left image is good, the other two are “grayed" to some 4 extent . Dynamic range compression and tonal 
rendition are good for all and compare well with scene observation. 


images examined in previous SSR research 10 (Fig. 3 middle row) still needs to be considered. These images represent 
a variety of regional and global gray-world violations and we can not expect the MSB to handle them effectively. 
We provide these results as a baseline for comparison with the color restoration which is developed next. All possess 
notable, and often serious, defects in color rendition. Since we want the MSB to be automatic, and the pathological 
images cannot be determined a priori, we developed an additional color computation which is universally applied to 
all post -retinex images to produce a general purpose computation. 


3, A Color Restoration Method for the Multi-scale Retinex 

The general effect of retinex processing on images with regional or global gray-world violations is a graving out 
of the image either in specific regions or globally. This desaturation of color can, in some cases, be severe (Fig. 3 
middle) Therefore we can consider the desired color computation as a color restoration, which should produce good 
color rendition for images with any degree of graying. More rarely, the gray-world violations can simply produce an 
unexpected color distortion (Fig. 3 top-left). Again we seek a simple computation which also handles these cases. In 
addition we would like for the correction to preserve a reasonable degree of color constancy since that is one of the 
basic motivations for the retinex. Color constancy is known to be imperfect in human visual perception, so some 
level of illurninant color dependency is acceptable provided it is much lower than the physical spectrophotometrir 
variations. Ultimately this is a matter of image quality and color dependency is tolerable to the extent that the 
visual defect is not visually too strong. 
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Figure 3: Pathological “gray-world” violations are not handled well by the multiscale retinex alone( middle row) but 
are treated successfully when color restoration is added (lower row). 


Starting with the foundations of colorimetry 14 , the color space is transformed using 

Ii(x, y) 




The color restoration function C(x,y) is then simply 


-,i= 1, N. 


(3) 


C(x, y) = Ci(x, y) — f [/,'(*. 2/)] . 


where represents linearlv or non-linearly normalized color space, and controls the saturation of the final rendition. 
The MSRCR is then given by 


N 

R,(x, y) = C(x,y)^rWi (log[/,(z, y)] - log [I,(x,y) * F[x,y]) . (4) 

t — 1 


This form provides dynamic range compression, color and lightness constancy, and very good color rendition. 


4. Selected Results for Diverse Test Cases 

The test images presented here begin with some test scenes. We feel it is fundamental to refer the processed images 
back to the direct observation of scenes. This is necessary to establish that how well the computation represents 




Figure 4: Test scenes illustrating dynamic range compression, color and tonal rendition, and automatic exposure 
correction. All processed images compare favorably with direct scene observation with the possible exception of 
leftmost image which is even lighter and clearer for observation. This scene has the widest dynamic range and 
suggests that even stronger dynamic range compression may be needed for this case. 


a result that is u what you would have seen if you had been there' 1 . Clearly we cannot duplicate human visions 
peripheral vision which spans almost 180°. but within the narrower angle of most image frames we would like to 
demonstrate that the computation achieves the clarity of color and detail in shadows, reasonable color constancy and 
lightness and color rendition that is present in direct observation of scenes. While we cannot yet test performance 
for scenes that go beyond 8-bit dynamic ranges, these results support the utility of the processing scheme for the 
enhancement of conventional 8-bit color images. The test scenes are given first (Figs. 4, 5) so that we can describe 
the degree to which the computation approaches human visual performance. All the test scene images after retinex 
processing are quite “true to life” compared with direct observation. We did not carefully match camera spatial 
resolution to observation so some difference in perceived detail is expected and observed. However overall color, 
lightness, and detail rendering for the multiscale retinex is a good approximation to human visual perception. 


5. Discussion 

The question which now arises is: What advantages does the MSRCR possess over traditional image enhancement 
techniques such as histogram equalization, non-linear transforms (gamma correction), and gain/offset, manipulation? 
Again the answer is based on experimental observation, rather than on theory. Each of the traditional techniques is 
well suited for a certain class of images, where the overall contrast is poor. They almost invariably fail where the 
image simultaneously contains very bright and very dark areas. They also fail to preserve the color when applied to 
images where the need for enhancement is not readily observable. The MSRCR successfully overcomes both these 
weaknesses of the traditional techniques. Figure 6 shows a comparison of the MSRCR with the traditional techniques 
for two natural scenes. The first contains a typical outdoor scene which has a sharp shadow across the frame. And 
the second is a good image which does not obviously need image enhancement. In both cases, the output of the 
MSRCR is either better than the original or as good. The same cannot be said of the traditional techniques. 

The MSRCR can be applied ex post facto on 8-bit, color images to provide image enhancement. The only 
problem arises when these images have been compressed using lossy methods. Not only does the MSRCR improve 
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Figure 5: Photographic examples further illustrating graceful dynamic range compression together with tonal and 
color rendition. The rightmost image shows the processing scheme handling saturated colors quite well and not 
distorting an image that is quite good in its original form. 



Figure 6: A comparison of image enhancement techniques: (a) MSRCR with 3 scales (b) Histogram Equalization, 
(c) Gamma correction, and (d) Gain/offset manipulation. 



the dynamic range and color, it also enhances the compression artifacts which had been imperceptible before the 
application. Hence, the retinex is best applied prior to lossy image coding. One obvious advantage that the MSRCR 
provides for image compression is its ability to compress wide dynamic ranges to 8-bit, or less, per band color output, 
while preserving, and even enhancing, the details in the scene. The overall effect then is a significant reduction in 
the number of bits (especially in cases where the original color resolution is higher than 8-bit/band), required to 
transmit the original without a substantial loss in spatial resolution or contrast quality. 

We have encountered many digital images in our testing which are either under- or overexposed. Apparently even 
with modern photographic auto-exposure controls, exposure errors can and do occur. An additional benefit of the 
MSRCR is it apparent capability for exposure correction. This is especially beneficial if it is performed before the 
image is recorded either on film or on disk. 


6. Conclusions 

The SSR provides a good mechanism for enhancing certain aspects of images and providing dynamic range 
compression. However, it is limited in its use because it can either provide good tonal rendition or dynamic 
range compression. The MSR comprised of three scales — small, intermediate, and large — overcomes this limitation 
and was found to synthesize dynamic range compression, color constancy, and tonal rendition and produce results 
which compare favorably with human visual perception except for scenes which contain violations of the gray-world 
assumption. Even when the gray- world violations were not dramatic, some desaturation of color was found to 
occur. The MSRCR adds a color restoration scheme which produced good color rendition even for severe gray- 
world violations, but at the expense of a slight sacrifice in color constancy. While there is no firm theoretical or 
mathematical basis for proving the generality of the MSRCR, we have tested it successfully on numerous diverse 
scenes and images, including some known to contain severe gray-world violations. 


7. Note to readers 

Color version of the figures which appear in this paper is avaliable upon request. Please send e-mail to 
zrahmanKtcs.wm.edu or us-mail to Zia-ur Rahman, Department of Computer Science, College of William & Mary, 
P.O. Box 8795, Williamsburg, VA 23187-8795. 
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Abstract 

The performance of wavelet compression algorithms is generally judged solely as a function of the compression 
ratio and the visual artifacts which are perceivable in the reconstructed image. The problem then becomes one of 
obtaining the best compression with fewest visible artifacts — a very subjective measure. Our wavelet compression 
algorithm uses an information theoretic analysis for the design of the compression maps. We have previously shown 
that maximizing the information for a given visual communication channel also maximizes the visual quality of the 
restored image. We utilize this to design quantization maps which maximize information for a given compression 
ratio. Hence we are able to design quantization maps which maximize the restorability of an image — i.e. the 
information content, the image quality, and the mean-square difference fidelity — for a given compression ratio. 

KEY WORDS: Image restoration, image compression, image quality 


1. Introduction 

Image compression algorithms are generally evaluated in terms of the amount of data compression, a measurable 
quantity, and the visual quality for this data rate, a subjective quantity. Neglected in the design of compression 
algorithms and the evaluation process are the effects on the data rate and the quality of the image due to the 
image acquisition and display systems. Some effort has been made to relate the effects of display into evaluation 
of image quality 1 but none in incorporating the characteristics of the image gathering system into the design of the 
compression algorithm. We present a new approach to designing and evaluating image compression algorithms in 
terms of the information transmitted by the visual communication channel. This approach incorporates the effects 
of the image gathering device characterized by the signal to noise ratio and the spatial frequency response ( S F R ) of 
the combined optics and photodetector array, the quantization due to the analog-to-digital (A/D) converter, and the 
errors due to the compression process into the design of the quantization maps. These maps are related to visual 
quality by the amount of information they allow through. Previously, 2-6 we have combined Shannon’s information 
theory 7 with Wiener’s restoration filter 8 and with the critical limiting factors that affect a visual communication 
channel to provide rigorous quantitative metrics for characterizing its design and evaluating its performance in terms 
of restorability. We now integrate lossy data compression into this framework to optimize data rates 9 and evaluate 
its effects on image resolution, and hence, quality. 

Figure 1 shows the visual communication channel. At the head of the communication channel is an image- 
gathering device which consists of a lens, a photodetector array, and an analog-to-digital (A/D) converter. This 
device converts the radiance field incident on it into a quantized, digital signal which is then transmitted. At the 
tail of the channel is a receiver which resolves the signal and provides the information to an image display device 
(e.g. video monitor, or a printer) which, in turn, represents this information in a form suitable for interpretation by 
an observer. Between the output of the A/D converter and the receiver, any number of digital image processing 
algorithms can be applied for image enhancement and efficient data representation and transmission. Traditional 
analysis of image compression and restoration algorithm is generally restricted to this stage only, neglecting the 
analog-to-digital conversion at image acquisition and digital-to-analog conversion at image display. This leads to an 
inherently incomplete model which results in restorations and compression rates which could have been better if the 
conversions at acquisition and display had been properly taken into account. 

The imaging process injects errors in the original information that is incident on the image gathering device. 
The combined SFR of the lens and the photosensor array blurs the radiance field; the photosensor array and the 

tThis research was funded by NASA Langley Research Contract #NAS1-19603 to Science and Technology Corporation and by NASA 
Langley Research Grant #NAGl-1847 to the College of William Mary. 




Figure 1: The visual communication channel 


A/D converter introduce electronic and quantization noise respectively; and the sfr of the display spot of the 
image-display device blurs the resolved signal. These errors result in a reduction in the amount of information 
that is received by the observer when compared with the information which was present at the beginning of the 
imaging process. Since for a lot of applications (e.g. remote sensing) the only information the observer has is this 
received information, it is advantageous to minimize the channel effects so as to maximize information. A rigorous 
mathematical analysis provides the framework to evaluate the performance of the communication channel and can 
thus be used to informationally optimize its design in terms of both resolution (restoration) and compression. We feel 
that this is essential in designing a channel which provides the most information, and hence the highest resolution, 
for the least data. 


2. Mathematics of the visual communication channel 

The visual communication channel is divided into five stages: image-gathering, decomposition, quantization, 
synthesis, and restoration. Though we present the mathematics of each stage individually, each stage builds on the 
preceding stages and the restoration filters depend upon the end-to-end process. 

2.1. Image gathering 

The image gathering device converts the incident continuous radiance field L(x, y) into the discrete signal s(.r, y) 
(Figure 2). The combined SFR of the device optics and photosensor array aperture, Td(x,y), blurs the input L(x,y), 
which is sampled by the rectangular unit sampling lattice, |||(x,y), and corrupted by the additive noise due to the 
analog-to-digital (A/D) conversion, A r fl /d(.r,y), and the electronics, A^(*r,y), producing 

s(z,y) = [KL(x,y) * r d (x, y)\ ||| + N e (x,y) 4- A r a /d(x,y), (la) 


A a f d ( £ t y) 



Figure 2: Image gathering: The radiance field L(x,y) is converted to signal s(x,y). 









where I\ is the steady-state linear radiance- to-signal conversion gain. The unit sampling lattice is given by 
HI = n=-oo ~ V ~ n )* Rewriting in Fourier domain. 


s(v, u;) 


K L(v,w)Td(v t w) 
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+ A r e(i f ,w) + Aa/dl^w), 


(lb) 


where the notation p(u,u) refers to the continuous Fourier transform of a function p(x,y) and q(v,u>) refers to 
the discrete Fourier transform of a function q(x,y). The Fourier transform of the sampling lattice ||| is given by 
III = Xlm n=-oo ~ — n )> with the associated sampling passband B — {v,ur: |t;|, \ w\ < 0.5}. The probability 

density function of the noise N a ^(x t y) can be written as 


p[^o/d(^,y)] 


K K 

S p m a,(*<y) -Sp mln (x,y) 2 ( 7 , ’ 


( 2 ) 


for s Pmal — ka s and s Pmin = —ka s which specify the range of the signal; k is the number of quantization levels of 
the A/D converter; and < 7 " = Jfg $ s (v,uj) dvdui. The power spectral density (psd) of the signal prior to 

quantization is 


$,(v,w) = £'[s(i;,at)s'(i!,u))] (3) 

= w)|fd(i’,u;)| 2 | * U] T d>.v e (r,uj), 

where £[] is the expectation operator, and * indicates complex conjugation. Assuming that the error within each 
quantization interval is uncorrelated with errors within other intervals. 



2.2. Decomposition 

Figure 3 shows signal decomposition, and synthesis and restoration. The signal s(x,y) is decomposed into N 


parts, at L levels using the discrete wavelet transform. 

«o,i(x,y) = s(x,y) (5a) 

si,j)(x,y) = [si-u(x.y) * r Al g {x, y)] JJj^ , 0 = 1 N, l = 1, .... L, (5b) 

where si-i t i(x,y) is the “low frequency” band from the previous level, r Al 0 (x,y) are the wavelet analysis filters at 
the current level and jj] = XY 5Zm=-oo Z^-oo “ mX,y - nY) is the downsampler by A r = Y = y/N . In 


the frequency domain this can be seen as dividing the passband into N segments each occupying l/ATh the original 
bandwidth. 


so t i(^^) = s(v,uj) (6a) 

5/^(r t w) = [sj_ u (t\u>)f^ a (i;,u/)] * |jj uj , & = ^ ■■■> / = 1, L, (6b) 

where the f Al are, generally, orthogonal, and jj| = ^=o El=o( 1 ’ “ ” r)* Th e s ig na ^ s occupy 

different frequency bands. Each signal can possess quite different characteristics and hence be amenable to different 
methods and rates of quantization. This provides a versatile method for efficient signal representation. 

Equations 5 show the relationship between the signal s(x,y) and the decomposed signals $i^(x,y) in terms of 
the wavelet analysis filters and the downsampler. More explicitly, using Equations 1, 

= j j[A'L(ar,y) *r d {x, y)] j|| + N e (x, y) + A r a /d(*,y)} * t a , y)] jjj^, (7a) 

= |a'Z(v, u)T d (v,u)f Al „(u,u>)j *|j| + ^(UjWjiU, >(J (r,w) + N a / d {v,u)f Al „ (i>,w)j *||| U) , (7b) 
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Figure 3: Wavelet decomposition, quantization, and restoration 


where ||| = n b (i; — u> — y ) t is the combination of the uniform rectangular sampling lattice of the image- 
gathering device and the downsampler. It has sampling intervals of ( A\ Y ) and associated passband Bi(v,lj) = { t\ jj : 

1 1 3 1 , |u>| < 2 _(;+1) }. 

2.3. Quantization 

The wavelet coefficients are quantized in the spatial-frequency domain. Each coefficient, si p(v, u>) contains a 
measurable amount of information Tii p(i\ uj) (§3). McCormick 10 et al., and Huck 4, 11 have shown that maximizing 
information leads to maximizing the image quality. Hence the coefficients are quantized under the constraint of 
maximizing total information for the number of bits used to represent the signal. The quantization transforms the 
wavelet coefficient into a representation Q[s\^(v,uj)]. The inverse process transforms the Q[Sf^(t\a/)] back 

to the coefficients plus, perhaps, a noise term. 

s\ r g(v,oj) - si j3 ( u,w) + N Qlt0 ( u,w), (3 - 2, A 

A' 

&=i 

where sj are the dequantized coefficients, and A r Q l{j (t>^) represents the “noise” due to the quantiza- 

tion /dequantization process. The initial conditions are given by 

= ^KL(v,u;)T d (v,ij)fA l * ||| + * l[l u 

+ [A T a /d((- , 1 w)f /ll ^(t'.w) * l|| ui + N Ql „(v.u>). 


(8a) 

(8b) 
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Assuming uniform quantization for each wavelet coefficient between the range ±k<Ti t p(v, u>) we can write the PSD of 
the coefficient quantization noise as 






1 ( k<ri'j3(v,u) \ 2 

3 V / 


( 9 ) 


where u/) are the ensemble standard deviations for the coefficient at frequency (i>,w) in band /? and level /, 

and Kp t i(v y uj) is the number of quantization levels for that frequency location. 


2.4. Synthesis 

The synthesis filters reconstruct the decomposed signals. When the coefficients are not quantized, this can be 
done perfectly. 12, 13, 14 With lossy quantization, however, the synthesis filters are generated to minimize the minimum 
mean square error (MSRE) e 2 (t\u) between the decomposed and reconstructed signal. The MS RE is given by 

c 2 (v,lj) = E [|s/,i(t’,w) - s'l ,(t’,u>)| 2 ] , 


Si i(v,uj) is the input and jfv, w) is the output to level / + 1. Using Equations 6 and 8, 

e (iW) 




*S,>,«)|?> I+I .> 1 w)| 2 +^VQ l + ,,( t ’’ W ) 


( 10 ) 


where is the PSD of sji(t\u>), and 3 'n Q i+1 (d,u>) is the PSD of the quantization noise in band l + 1. The 

<I>s, 9 (t',u;) can be defined by a set of iterative equations 


5>s 0 ,,(i’.w) = /\ 2 $t(i>,u;)|T- <i (D,w)| 2 *||| + 4>,v + <J>!v c (t\uO 
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where jjj, - E^-co «(* - mfX^w - n/Y * ). 

2.5. Restoration 


The Wiener-matrix restoration filters ^(t>,w) 6, 15 synthesize the N level-1 outputs ^(u,w) into a single 
continuous image 1Z given by 

AT 

~ s'i p(i\ w) (12) 

(3=1 


where 

ff(n,u;)] 

L J pa 


$^(t>,w) = $L(v,w)fJ(u,ui)^ f‘ Al i0 (u,w) [f V,^)]^, (13) 

Q = 1 

J$£(v,w)|7Vj(l>,w)| 2 7U 1 ^(tl,w)f^ li0 (l',w)| * jjj + [*Af«(t>,w)f j4li() (v,w)f^ li0 (t>,w)] ♦jit, (14) 

+ [*N a/ ifA I>( ,(t’,w)fJ lta (v 1 w)*i|l |u ] +^w 0l a (v,u)6(0,Q) t 


$L(t’>w),$jv. (t\w), 3>,v a/ii (i.\u;) and (t'.u;) represent the PSD of the radiance field, the photodetector 

noise, the A/D quantization noise, and the level-1 quantization noise respectively. When f Al /) (v, w)f^ ( a (t\o;) = 
If,,, g (v,uj)\ 2 6(p,a), the Wiener matrix filter reduces to 


*fl(t’,w) 


$i(t',w)f d *(i;,w)f; iflflJw) 




(15) 



The formulation of the Wiener-matrix filter takes into account the degradations due to not only the blurring and 
noise 16 - 1 ' but also due to insufficient sampling, the analysis filter response, and quantization. It also suppresses the 
blurring and raster effects in image display by interpolating the image-gathering lattice on an at least 4 times finer 
image- display lattice. 


3. Information 

The information rate, is used to evaluate the performance of the visual communication channel. The 
only information the observer has about the incident radiance field is that contained in the restored image. The 
degradations due to aliasing and various noise sources appear as artifacts. The information at each level in each 
decomposed band 'Hi^ is 


'Hifi - Jj ' 'Hi t 0 (v,u)dvdu) = 1 Jj log 2 


1 + — 






dvduj . 


( 16 ) 


where ||| — ^m=o £3n=() ^( tJ “ — y ) > m = n ^ 0, are the sidebands of the down sampler, and the PSDs 

<J> 5 , A are given by Equation 11. The total information for the visual communication channel is 


L-i n N 

H = 

1 = 1 /?= 2 P — l 


4. Quantization Algorithm 

Based on Equation 16(a) which provides the amount of information each wavelet coefficient contributes to the 
total information, a quantization scheme can be devised which either maximizes the total information for a given 
number of bits, or minimizes the total number of bits given an acceptable level of information loss. The scheme is 
as follows: 

1. Quantize all the coefficients at the maximum rate. 

2. For each coefficient, determine the loss in total information when the quantization rate for the coefficient is 
reduced by 1 bit. 

3. For the coefficient which least affects the total information, reduce the quantization rate by 1 bit. 

4. Iterate this process until either the total number of bits is exhausted, or the acceptable information loss has 
been achieved. 

Because the power spectral density used in developing these quantization measure is not the actual PSD of the signal, 
but instead a statistical quantity, the quantization tables thus obtained can be used for a wide range of input scenes, 
producing excellent results for input radiance fields which closely match the assumed radiance files PSD and good 
results for others. 

5. Simulation Assumptions 

Since it is virtually impossible to successfully estimate all the parameters of an image-gathering device from 
the received signal, we use simulated imagery to test our algorithm. This allows to closely control the system 
characteristics which affect image quality and compression and observe their effects in a controlled environment. 

We use targets made of randomly generated polygons whose mean spatial detail — the average distance between 
edges — /i, is Poisson distributed and intensity levels are Gaussian distributed with standard deviation The targets 
with mean spatial details y - 1 and 3 are shown in Figure 4. The associated radiance field L(x,y) is stationary 
and Gaussian. We assume that both the electronic noise A r c (x,t/), and the A/D quantization noise N a / d (x,y), are 



Figure 4: Random polygon targets: (a) // = 1 (b) fi — 3 


Design 

Pc 

I\<tl/<tN' 

Hmax 

1 

0.3 

256 

4.4 

2 

0.4 

64 

3.3 

3 

0.5 

16 

2.1 


Table 1: Informationally optimized visual communication channel designs. 

uncorrelated with the radiance field L(x, y). In addition, we model the SFR of the image gathering device with a 
Gaussian, f d (v,uj) = exp [ _ (/') 2 ]> where p — v 2 + ur, and the electro-optical design index p C} which controls the 
width of the response, and hence the tradeoff between aliasing and blurring, is the point where f«*(i\u/) = 1 /e « 0.37. 
When p c is large there is more aliasing and when it is low, there is more blurring. The restoration filters typ are 
generated at at least 4 times finer density than the sampling lattice to suppress the raster effects of the display 
device 10 . 

6. Results 

In order to design optimally efficient visual communication channels, i.e. channels which transmit the most 
information for the least data, it is first necessary to look at optimal visual communication channels, i.e channels 
which transmit the most information given a certain image-gathering device. 

Visual communication channels, rather simplistically, are generally characterized only by the signal-to-noise ratio 
( SNR) ,* and the SFR of the combined electro-optics. Huck et al. 3 * 11 have studied the problem of designing visual 

communication channels for maximum information throughput in detail. Table 1 specifies three visual communication 
channel designs, in terms of the SNR and the electro-optical design index p c , that have been informationally optimized. 
The amount of information each of these designs transmits is also given. We will introduce lossy wavelet compression 
as an additional constraint on these communication channels and design quantization maps which maximize the 
information throughput for a given bit rate, or minimize the bit rate for a give acceptable amount of information 
loss. 

The overall effect of lossy quantization on the transmitted signal is the introduction of an additional noise source 
Nq i 0 (v,w) (Equations 13-16) which affects the visual quality of the reconstructed image. If the quantization maps 
are carefully designed, these effects can be minimized. Figure 5 shows quantization maps developed using the 

*Bv this we mean the SNR of the electronic device used for image gathering. We will not consider the effects of channel noise on the 
integrity of transmission. 




Figure 5: Quantization maps designed with a 5% acceptable loss in information for (a) Design 3 (b) Design 1 
(Table 1). 

algorithm given in Section 4 for Designs 1 and 3 specified in Table 1. The total acceptable loss in information for 
these maps is no more than 5%. The priority with which bits are encoded in each band is clearly evident in the 
quantization maps. Most of the coefficients in the lowest frequency bands contain high information so they are 
retained in preference to the coefficients in the higher frequency bands which contain less information. Because the 
bit allocation algorithm (§3) reduces the number of bits for the coefficient based upon the amount of information 
7i( {\j) it contains, the initial reduction in the number of bits does not have a significant impact upon the total 
information. However, as more and more bits are discarded, the information content of the affected coefficients 
is higher, and the rate of information reduction increases. It is also interesting to look at the reduction in total 
information as a function of the reduction in the number of bits in each band. The information content of coefficients 
in band 4 is very low, hence the number of bits can be reduced by about 15% before any impact is felt on the total 
information; coefficients in bands 2 and 3 have higher information content and have more of an impact on the total 
information; and the coefficients in band 1 have the highest information content as is evident by the sharp reduction 
in 7i as the total number of bits for band 1 decreases. 

A second point of interest is the the effect on H as a function of the channel characteristics. The total information 
for all designs is a monotonically increasing function of the total number of bits being used to represent the signal. 
The rate of increase, however, is smaller at lower values of the SNR. Thus, for a given number of bits, the reduction 
in information per bit is generally highest for Design 1, and the lowest for Design 3 which suggests that greater data 
compression can be achieved for channels that have lower SNRs since the quantization effects will not overwhelm 
what is already a noisy signal. 

Figure 6(b) shows the decomposed images of the targets shown in Figure 4, and Figure 6(c) shows the restored 
images. The results are shown for an acceptable loss level of 5%. The restorations show some obvious artifacts, more 
so in the // - 3 image than in the p = 1 image. This is because the aliasing artifacts and colored noise are. to a 
certain extent, masked by the detail in the scene. Conversely, one sees loss of detail in the mu = 1 scene due to 
the loss of high-frequency information in the quantization process. These results point to the necessity of developing 
better filter banks for the restoration of images. 

7. Conclusions 


We have presented an integrated treatment of designing quantization maps for a given visual communication 
channel based upon the metric of maximizing information. Since the design of a compression algorithm should 




Figure 6: (a) The blurred and noisy signals (output of the image gathering stage), (b) the decomposed signals 
(output of the wavelet decomposition stage) at L — 3, and (c) The expected restored images. Results are shown for 
restoration done from level- 1 components. The original scenes are shown in Figure 4. 


maximize visual quality in addition to the compression ratio, it is imperative that an end-to-end system analysis be 
used in both the design of the compression algorithms and in the design of the evaluation process. Though some 
work has been done on incorporating t he effects of image display on the perceived image quality 1 , our development is 
unique in the sense that it incorporates the effects of the image gathering process into the design of the compression 
algorithm. 

Though the algorithm outlined here is simple conceptually, it is computationally intensive and for that reason the 
results presented here are for the simplest case where the analysis filters are orthogonal. This reduces the number of 
computations significantly but at the same time provides good insight into the results which can be expected from 
this approach. But this also does not fully achieve the image quality that is expected of this algorithm. Current 
research is looking at improving the analysis/synthesis filters, as well as improving the robustness of the restoration 
filters. This will lead to both better image quality in terms of human perception, but also restorability in terms of 
the amount of detail resolved in the displayed images, and improved compression ratios. 
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ABSTRACT 

This paper deals with the extension of information 
theory to the assessment of visual communication from 
scene to observer. The mathematical development rig- 
orously unites the electro-optical design of image gath- 
ering and display devices with the digital processing 
algorithms for image coding and restoration. Results 
show that: 

• End-to-end system analysis closely correlates with 
measurable and perceptual performance characteristics, 
such as data rate and image quality, respectively. 

• The goal of producing the best possible image 
at the lowest data rate can be realized only if (a) 
the electro-optical design of the image-gathering de- 
vice is optimized for the maximum-realizable infor- 
mation rate and (b) the image-restoration algorithm 
properly accounts for the perturbations in the visual 
communication channel. 


1. INTRODUCTION 

Modern visual communication channels increasingly 
combine image gathering and display with digital image 


coding and restoration (Fig. 1). So far, however, the 
image-gathering devices are still designed to produce 
the best possible images when reconstructed without 
the aid of the digital processing, and the image cod- 
ing and restoration algorithms are still developed and 
evaluated without fully accounting for the critical con- 
straints of image gathering and display. 

The aim of this paper is to summarize some ele- 
ments of a study [1,2] that rigorously unites the electro- 
optical design of image gathering and display devices 
with the digital processing algorithms for image coding 
and restoration. The study is based on the two classical 
works that are the foundation of modern communica- 
tion theory. In one work Shannon [3] introduces the 
concept of the rate of transmission of information in a 
noisy channel, and in the other Wiener [4] introduces 
the concept of the minimum mean-square error restora- 
tion of signals corrupted by noise. 

Although our mathematical development is firmly 
rooted in these familiar concepts, it leads to formula- 
tions that are significantly different from those that are 
found in the traditional literature on either rate distor- 
tion theory or digital image processing. These differ- 
ences arise mainly because of two critical factors that 
this literature has not addressed so far. namely: 


Information 

Source 


Image Gathering 
System 


Image Restoration 
System 


Destination 



Figure 1. Model of visual communication channel together with the critical limiting factors that constrain its performance. 





1. The limitations inherent in the realizability of the 
spatial frequency response (SFR) of optical apertures 
and the sampling passband of photodetection mecha- 
nisms impose a trade-off between blurring and aliasing 
on the design of the image-gathering device (Fig. 2). 
This precludes the treatment of visual communication 
as a bandwidth-limited process, but, instead, it requires 
the inclusion of the effects of insufficient sampling. 

2. The image-gathering process bars the encoder 
from unperturbed access to the scene (i.e., the original 
source). This precludes the application of information 
theory directly to the scene for the analysis of data 
compression and rate distortion. Instead, this theory 
now must account for the perturbations that the image- 
gathering process causes, namely, the photodetector 
and quantization noises as well as the blurring and 
aliasing. 

2. IMAGE GATHERING AND 
RESTORATION 

The image-gathering process transforms the continuous 
radiance field L(x,y) that is either reflected or emitted 
by the scene into the digital signal s(a*,y;K), and the 
image-restoration process transforms this signal into the 
observed image R(x,y;n). In the Fourier domain, the 
image-gathering process is defined as 

s(t’,u;;K) = [I\ L(v,u})t( v,uj)] * ||| + hp(v y uj) 

+ n q {ViV\ k), ( 1 ) 

where L(u,uj) is the continuous radiance-field trans- 
form, r(i\u;) is the SFR of the image-gathering device, 
iip( i\w) and h q (v,uj; k) are the discrete photodetector 
and quantization noise transforms, and (t\w) are the 
spatial frequencies with units of cycles per sample. The 
tilde is used instead of the caret ” whenever the 
Fourier transformation is discrete and, therefore, the 
transformed function is periodic in the spatial frequency 
domain. The function ||| is the Fourier transform of the 
sampling lattice, as given by 

111 = n) 

m,n 

where 6 (n,w) is the Dirac delta function and ||| ac- 
counts for the sampling sidebands. The associated 
sampling passband 

(t%w); M < t. | w | < i 



\>,G) 


Figure 2. SFRs T(u,co) of the image-gathering device relative 
to the sampling passband B for unit sampling intervals. 

has unit area, i.e., |R| = 1. The analog-to-digital trans- 
formation is done for k levels with 77 -bit. quantization, 
where rj — log k and log denotes logarithm to base 2 . 
The corresponding image-restoration process is defined 
as 

R(v,w,k) = I\~ ] [s(x , y: k) + A>(i>,u/)], (2) 

where ^(u,o;;k) is a linear filter that records the dig- 
itally processed signal on an interpolation lattice that 
is sufficiently fine to suppress the blurring and raster 
effects of the image-display process and A' r (t\u?) is 
the transform of the reconstruction noise (e.g., film 
granularity). 

To assess visual communication in terms of informa- 
tion theory, the image-gathering process is constrained 
to be linear and isoplanatic (spatially invariant), and 
the radiance-field and noise amplitudes are constrained 
to be Gaussian, wide-sense stationary, and statistically 
independent. In addition, we characterize: (a) the radi- 
ance field L(x<y) by the power spectral density (PSD) 

of an isoplanatism patch of the scene with area |A|, 

(b) the discrete signal s(x,y) prior to quantization by 
the PSD 

**(«,«) = [A- 2 $ ; (t-, W )|r(r. w )| 2 ] *ji!+^(i’,w). (3) 

(c) the photodetector noise n p (x, y) by the PSD $ 7) ( i>, u;). 
and (d) the quantization noise n q (x,y\ K) by the PSD 

,^;k) = (^f) . 


B = 


(4) 



where 


^ = y/ 


(v, u;) dvcluj. 


3. FIGURES OF MERIT 

By accounting for the critical constraints of image gath- 
ering, we can quantitatively assess visual communica- 
tion in terms of the following figures of merit: 

1. The rate of transmission of information, or infor- 
mation rate, H that the image-gathering system pro- 
duces for the radiance field that resides within its field 
of view, as given by 

n= \JJ ioe 

B 


1 + 


w; k ) 


dvdui (5) 


where 


$„(i\u;;k) = $,_(i>,w)|f(t\w )| 2 * 11^ + A' 2 [4> p (t>,w) 

u;; k)]. 


2. The theoretical minimum data rate £ which is asso- 
ciated with the information rate 7f, as given by 


B 

This expression for £ represents the entropy of com- 
pletely decorrelated data. 

3. The maximum-realizable fidelity T of the digital 
image that can be restored from the received informa- 
tion, unconstrained by the image-display medium, as 
given by 


1 + - 


4> iS (t\ ij) 


k) 


dv du 


( 6 ) 


T = 


-2 



J _ 2 -W<*'.-')| dvduj, (7) 


where H( v,lj) is the spectral distribution of the infor- 
mation rate 7 i given by the integrand of Eq. (5). 

Reference 2 fills in the many details. It also for- 
mulates the information rate H 0 and the maximum- 
realizable fidelity T G of the observed image that the 
image- restoration system produces from the received 
information on an image-display medium (e.g., film). 
In addition, it accounts for multiresolution decomposi- 
tion (wavelet transform) to optimally integrate the eco- 
nomical encoding of the transmitted signal with image 
gathering and restoration. Finally, Ref. 5 applies the 


information-theoretic assessment to the electro-optical 
design of the image-gathering device. It accounts for (a) 
the /-number, diffraction, and transmittance shading of 
the objective lens, (b) the sensitivity, aperture shape, 
and sampling geometry of the photodetection mecha- 
nism, and (c) the dynamic-range compression with lat- 
eral inhibition in the focal plane. 

Figure 3 characterizes the information rate H as 
a function of the electro-optical design of the image- 
gathering device, as specified by the optical-design in- 
dex p c and the rms signal-to-noise ratio (SNR). The 
curves show that the preferred SFR is a function of 
the SNR. This result is intuitively appealing for im- 
age restoration. In one extreme, when the SNR is low, 
one would prefer to avoid substantial blurring because 
the noise constrains the enhancement of fine spatial de- 
tail. In the other extreme, when the SNR is high, one 
would prefer to avoid substantial aliasing because then 
the noise no longer contrains this enhancement. 

Figure 4 presents an information-entropy H(£) plot 
that characterizes the information rate H versus the 
associated theoretical minimum data rate £ for r;-bit. 
quantization and three informationally optimized de- 
signs of the image-gathering device. This plot serves 
as a useful alternative to the familiar rate-distortion 
function, which is based on the premise that the encoder 
has unperturbed access to the original source and, 
therefore, directly controls the trade-off between dis- 
tortion and data rate. The curves show that the 
electro-optical design that increases 'H also decreases 
the associated £ and, thereby, substantially improves 
the information efficiency 7i/£ of the data t ransmission. 


SNR 

256 



Figure 3. Information rate 9 ( versus optical-design index p c 
for several SNRs. 




Figure 4. The information-entropy !tf(£) plot that charac- 
terizes the information rate IH versus the associated 
theoretical minimum data rate £ for r|-bit quantization. 

The three curves represent informationally optimized 
designs specified by the optical-design index p c and SNR. 

Figure 5 presents images that illustrate the transi- 
tion from traditional telephotography and television in 
which images are reproduced without digital process- 
ing to modern visual communication systems in which 
images are reproduced with digital restoration. 

4. CONCLUSION 

The image-gathering device that is designed to 
produce the maximum-realizable information rate or- 
dinarily maximizes (a) the efficiency of the information 


transmission (i.e., the ratio of the information rate 7 I to 
the theoretical minimum data rate £), (b) the quality 
of the image restoration (i.e., the restorabilitv of im- 
ages for fidelity, resolution, sharpness, and clarity), and 
(c) the robustness of the image restoration (i.e., the tol- 
erance of the restoration to errors in estimates of the 
radiance-field statistics). This critical dependence of 
the efficiency, quality, and robustness of visual commu- 
nication on the design of the image-gathering device is 
largely independent of the statistical properties of nat- 
ural scenes. 
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Figure 5. Reconstructions and Wiener restorations for two designs of the image-gathering device, the traditional 
design (p c = 0.8, SNR = 16) and an informationally optimized design (p c = 0.4, SNR = 64). 
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ABSTRACT 

The retinex is a human perception-based image pro- 
cessing algorithm which provides color constancy and 
dynamic range compression. We have previously re- 
ported on a single-scale retinex (SSR) and shown that it 
can either achieve color/lightness rendition or dynamic 
range compression, but not both simultaneously. We 
now present a multi-scale retinex (MSR) which over- 
comes this limitation for most scenes. Both color ren- 
dition and dynamic range compression are successfully 
accomplished except for some “pathological” scenes 
that have very strong spectral characteristics in a single 
band. 

1. INTRODUCTION 

A common problem with color imagery — digital or 
analog — is that of successful capture of the dynamic 
range and colors seen through the viewfinder onto the 
acquired image. More often than not, this image is a 
poor rendition of the actual observed scene. In 1986, 
Edwin Land presented the last version of his retinex[l] 
as a model for human color constancy. Hurlbert[2, 3] 
showed that there is no mathematical solution to the 
problem of removing lighting variations. Moore[4, 5] 
implemented a version of the retinex in analog VLSI for 
real-time dynamic range compression but encountered 
scene context dependent limitations and hence failed to 
achieve a generalized implementation. More recently 
we, inspired by the work of Land, Hurlbert, and Moore 
decided to delve into this commonly occurring, but 
surprisingly intractable, problem. Our initial research 
resulted in the single-scale retinex (SSR) that we have 
described in detail previously[6, 7, 8]. The SSR shows 
exceptional promise for dynamic range compression 
but does not provide good tonal rendition. In fact, a 
distinct trade-off controlled by the scale of the surround 
function exists between dynamic range compression 
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and tonal rendition, and one can be improved only at 
the cost of reducing the other. 

This paper describes our initial research in allevi- 
ating some of these trade-offs by using a multi-scale 
retinex (MSR), i.e. a retinex which combines several 
SSR outputs to produce a single output image which 
has both good dynamic range compression and color 
constancy, and good tonal rendition. The tonal rendi- 
tion, though, is still scene dependent to a certain ex- 
tent. We will briefly describe the MSR in Section 2. In 
section 3 we will provide some of the results of apply- 
ing the MSR to color images and compare our results 
with other techniques for image enhancement. Finally, 
in Section 4 we will discuss the future direction for this 
research. 

2. THE MULTI-SCALE RETINEX 

The MSR can be compactly written as 

N 

F,(x,y) = Y, W n- 

n — 1 

{log[Si(:r,j/)] - log[Si(j\?/) * A/„(z,y)]} (1) 

where the subscripts i E R, G, B represent the three 
color bands, N is the number of scales being used, 
and W n are the weighting factors for the scales. The 
M n (x,y) are the surround functions given by 

M„(x,y) = A„exp[-(r 2 + y 2 )/V 2 ], 

where the <j n are the standard deviations of the Gaus- 
sian distribution that determine the scale. The mag- 
nitude of the scale determines the type of information 
that the retinex provides: smaller scales providing more 
dynamic range compression, and larger scales provid- 
ing more color constancy. The K n are selected so that 
jf F(x , y) dx cly = 1 . Each of the expressions within the 
summation in Eq. 1 represents an SSR. 

The SSR has been previously defined [6] to have the 
following characteristics and properties: 



1. The functional form of the surround is a Gaus- 
sian. 

2. The placement of the log function is after sur- 
round formation. 

3. The post-retinex signal processing is a “canon- 
ical” gain-offset rather than an automatic gain- 
offset . 

4. There is a trade-off between dynamic range com- 
pression and tonal rendition which is governed by 
the Gaussian surround space constant. A space 
constant of 80 pixels was a reasonable compro- 
mise between dynamic range compression and 
rendition. 

5. A single scale seemed incapable of simultaneously 
providing sufficient dynamic range compression 
and tonal rendition. 

6. Violations of the gray-world assumption led to 
retinexed images which were either “grayed-out” 
locally or globally or. more rarely, suffered from 
color distortion. 

The MSR combines the dynamic range compression 
of the small scale retinex with the tonal rendition of 
the large scale retinex to produce an output which 
encompasses both. 

As stated above, the MSR still suffers from graving- 
out of uniform zones much as the SSR did. The advan- 
tage that the MSR has over the SSR is in the com- 
bination of scales which provide both dynamic range 
compression and tonal rendition at the same time. The 
overall result of the applicat ion of the MSR is still more 
saturated than human observation, giving the final im- 
age a "washed-out” appearance, but it. preserves most 
of the detail in the scene. This “graying” of areas 
of constant intensity occurs because the retinex pro- 
cessing enhances each color band as a function of its 
surround. The smaller values in the weaker channels 
get "pushed” up strongly, making them approximately 
equal in magnitude to the dominant channel, leading 
to a graying out of the overall region. Moore[4] en- 
countered this problem in his implementation of the 
retinex and attempted to resolve it with using variable 
gains across the color channels. We do not attempt a 
solution in this paper but provide a detailed solution 
elsewhere. [9] However, the MSR produces a much bet- 
ter final image in terms of color, and dynamic range 
than the SSR. Figure 1 shows a comparison of the SSR 
and the MSR processing. The differences are easier to 
see in the original color images (see CD-ROM version 
of paper), but if one looks around the left side of the 



Figure 1: (a) Original (b) Single-scale Retinex (c) 

Multi-scale Retinex 




face and in the area just above the right shoulder of 
the pictured man, one sees details for the MSR which 
are not evident in the SSR. Also the “haloing” artifacts 
peculiar to the SSR are eliminated in the MSR. 

3. RESULTS 

Figure 3 shows a comparison of the MSR with image 
enhancement methods typically used for dynamic range 
compression. The scenes are selected to show the ef- 
fects of MSR processing on “good images” (top row), 
wide dynamic range compression that is achieved by 
the MSR (middle row), and color constancy (bottom 
row). Histogram equalization performs well for the 
child image, but begins to saturate in both the grass im- 
age and the cave image. The logarithmic non-linearity 
has the poorest performance for all three scenes, though 
its dynamic range compression capabilities are quite ev- 
ident in the grass scene. For the MSR processing, the 
uniform regions in the child scene tend to gray out, 
but the overall result is still quite good. For the grassy 
field, the MSR processing compresses the wide dynamic 
range well and brings out the colors in both the bright 
and the dark areas very well. For the cave image, the 
color of the inside rock, and the outside rock forma- 
tions are both brought out so they agree with actual 
observation. The CD-ROM version of the proceedings 
contains the color postscript figures and the compar- 
isons are much easier to make. 

The MSR output brings out most of the detail in the 
black regions but at the cost of enhancing the noise in 
these regions. This noise is a result of the poor signal- 
to-noise ratio in these areas. The traditional techniques 
are also able to enhance the dark regions, but not to 
the same extent as the MSR. In fact, the MSR achieves 
a balance between enhancing the darks, yet, at the 
same time, retaining the colors in the bright regions, 
as opposed to traditional point non-linearities which 
tend to enhance the darks at the cost of saturating the 
brights (Figs. 3(b,c)). Of course, the final rendition in 
still scene-dependent and can often be grayed-out if the 
original scene contains large areas of constant intensity 
(Fig. 3(d)(top row)). 

The MSR output is different from existing tech- 
niques in that the overall effect of processing is scene 
dependent but the processing itself is not. In other 
words, though the overall effect adapts itself to the 
lighting variations within the scene, the same process, 
with exactly the same control parameters can be used 
for any image. This is not true for other adaptive tech- 
niques since variations in light ing condit ions imply vari- 
ations in the control parameters. 


4. FUTURE RESEARCH 

The main direction of further research is to improve 
the color rendition of the MSR. Though it produces 
excellent dynamic range compression, the tonal rendi- 
tion is scene dependent and can be quite poor. Work 
is already underway on a newer version of the MSR 
which combines a post-filter with the MSR to produce 
an MSR which provides very good color rendition w'ith 
a very slight loss in overall dynamic range compression. 
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Abstract 


Recorded color images differ from direct human viewing by the lack of dynamic range compression and color constancy. 
Research is summarized which develops the center/surround retinex concept originated by Edwin Land through a single- 
scale design to a multi-scale design with color restoration (MSRCR). The MSRCR synthesizes dynamic range 
compression, color constancy, and color rendition and, thereby, approaches fidelity to direct observation. 

Introduction 

A comparison of the recorded color image and the "view through the viewfinder" are strikingly different (Fig. 1) for most 
everyday scenes due to the presence of shadows. Color and detail in shadows are far more clear in the direct view than in 
recorded images. We have developed Land’s concept of a center/surround retinex 1 to the level of single-scale retinex 
(SSR) design for which there is a trade-off between dynamic range compression and tonal rendition that is governed by 
the choice of the surround space constant. Comparison of processed images to direct scene viewing established that no 
value of an intermediate space constant could simultaneously provide sufficient dynamic range compression and good 
tonal rendition. The single-scale retinex provided a building block for the construction of a multi-scale retinex which 
does couple acceptable dynamic range compression with good tonal rendition. Color constancy is excellent for all forms 
of the retinex but color rendition was elusive as a result of the gray world assumption implicit to the retinex computation. 
A color restoration was developed and applied after the multiscale retinex in order to overcome this color loss but with a 
modest dilution in color constancy. 



Recorded Observed 


Figure 1 . The discrepancy between recorded images and direct 
observation. Human vision strongly compresses visual information across 
wide-ranging illumination conditions within a scene. 



Methods and Results 


Here we briefly highlight results that are described comprehensively elsewhere 2 ' 5 . The design of the single-scale retinex 
(SSR) consists of: 1) the choice of a surround function, 2) the placement of the log function, and 3) final signal 
processing prior to display/print. Of the three mathematical functions' 6,7 previously used for a retinex surround, we 
found the best visual performance with the Gaussian compared to either the exponential form or the inverse square form 
originally used by Land. Unlike previous studies, we find that the placement of the log function is quite important in both 
mathematical and visual terms. We show that its placement after the surround formation is preferable to placement prior 
to surround formation. Processing after the basic spatial retinex operation was found to be a "canonical" gain-offset 
applied uniformally to all color bands rather than an auto gain-offset' calculated across the full three band data. These 
elements lead to a SSR defined as: 


* ( (*vV)“log /foy)- log [F(x,y) * / (x,y)] (1) 

where Ij(x,y) is the image distribution in the ith color band, * denotes convolution, and F(x,y) is a Gaussian surround 
function. This is followed by the constant gain-offset applied across all color bands which, thus far, has proven to be 
universally constant or "canonical" for all images tested. This characteristic provides for general purpose and automatic 
application of the method and for simple construction of a multi-scale retinex as: 

n 

R .(xy) m !cR.(xy) ( 2 ) 

m t i k kj 


for the kth surround space constant. The design of the multi-scale retinex was found to require a minimum of three scales 
for image frame sizes of about 512x512 pixels. A comparison of direct viewing of scenes to scene photometry 
established that dynamic range compression for human vision is typically 5: 1 or so for outdoor scenes with shadows and 
easily achieves 500:1 for mixed interior/exterior scenes. From this it is evident that everyday scenes often exceed the 
255:1 (8-bit) dynamic range of most color imaging systems and that wide dynamic range color imaging, together with 
the retinex, or other compressive processing, is essential if recorded color images are to approach the quality of 
observation. The use of test scenes together with a battery of diverse digital images revealed that the violations of the 
gray world assumption implicit to the retinex were a common occurrence both zonally and globally in images. The 
degree of impact on color rendition ranges from slight desaturation of color to rather severe graying for the extreme 
cases of "monochromatic" scenes. Therefore a color restoration that could be universally applied was developed because 
scene content is not predictable. Thus the MSRCR is given by: 


R ' (xy) =R (xy) * 1 9 (xy) 

rrtyi mj i 


where the color restoration, IN, (x,y), is: 


I'fxy )- log 


/fry) 

c— 

3 

I Ifxj) 

i-1 


( 3 ) 


( 4 ) 


The current form of the MSRCR does compare favorably with direct viewing by synthesizing dynamic range 
compression and color constancy with color and tonal rendition. 


Applications 


2 



We isolate two applications of the MSRCR to illustrate a wider range of applications- aerospace image enhancements 
and digital photoprocessing. The MSRCR can be used to advantage in both space operations and remote sensing (Fig. 2). 
For the former, the often dramatic lighting variations present in space operations can be ameliorated and better visual 
information achieved. For remote sensing, the MSRCR brings out the visual information present in large shadow zones 
and large zones of low reflectance, such as water areas. An example of an enhancement for improved documentation ot 
aeronautical research is also shown. The automatic correction of low exposure images is evident and is useful for digital 
photoprocessing. 



a u 


Figure 2. Enhancement of aerospace images using the MSRCR: 
a) Shuttle operations, b) remote sensing, and c) aeronautical research 
documentation. 


The MSRCR can be used as a "digital darkroom", allowing burning and dodging of areas that would have been 
extremely labor intensive if not impossible using traditional darkroom techniques. Although there are software packages 
that allow the selective lightening and darkening of specific areas of digitized images, in the cases below, it would be 
impractical because of the degree of detail required in the selection of these areas and the different changes required tor 
each selection. 
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Figure 3. The MSRCR as a digital photo-processing method: graceful 
automatic "burning and dodging”: 

a) Underwater image from traditional film camera, b) Image from satellite 
data, c) digitized image from nuclear magnetic resonance (NMR) film. 
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Abstract 

The goal of image gathering and restoration often is to pro- 
duce the best possible picture in terms of fidelity, sharpness 
and clarity. However, this goal cannot be attained, at it 
has been pursued in the past, by treating image gathering 
and restoration as independent tasks. Instead, in a clean 
departure from the mores of traditional image processing, 
we present an approach that rigorously uses modern com- 
munication theory to optimally combine the electro-optical 
design of the image gathering device with the digital pro- 
cessing algorithm for image restoration. Extensive sim- 
ulations have shown that there exists a strong correlation 
between the information rate that is produced by the im- 
age gathering device and the image quality with which an 
image can be restored. 

Introduction 

Modern visual communication channels increasingly com- 
bine image gathering and display with digital image cod- 
ing and restoration (Fig. 1). So far, however, the image- 
gathering devices are still designed to produce the best 
possible images when reconstructed without the aid of the 
digital processing, and the image restoration algorithms 
are still developed and evaluated without fully accounting 
for the critical constraints of image gathering and display. 

The aim of this paper is to summarize some elements 
of a study 1 • 2, 3 rigorously unites the electro-optical de- 
sign of image gathering and display devices w'ith the dig- 
ital processing algorithms for image coding and restora- 
tion. In particular, this paper will present the information- 
ally optimized image-gathering designs that maximize im- 
age quality in terms of clarity and sharpness of fine detail. 
The study is based on the two classical works that are the 
foundation of modern communication theory. In one work 
Shannon 4 introduces the concept of the rate of transmis- 
sion of information in a noisy channel, and in the other 
Wiener 5 introduces the concept of the minimum mean- 
square error restoration of signals corrupted by noise. 

Although our mathematical development is firmly rooted 
in these familiar concepts, it leads to formulations that are 


significantly different from those that are found in the tra- 
ditional literature on digital image processing. One fun- 
damental difference, which we address in this summary 
paper, arises primarily because the limitations inherent in 
the realizability of the spatial frequency response (SFR) of 
optical apertures and the sampling passband of photode- 
tection mechanisms. The limitations inevitably impose a 
trade-off between blurring and aliasing on the design of 
the image-gathering device (Fig. 2). This precludes the 
treatment of visual communication as strictly a bandwidth- 
limited process. Instead, it requires the inclusion of the ef- 
fects of insufficient sampling both in the end-to-end analy- 
sis of the visual communication channel, and in the devel- 
opment of the restoration algorithm. 

Image gathering and restoration 

The image-gathering process transforms the continuous ra- 
diance field L(x, y) that is either reflected or emitted by the 
scene into the digital signal s(x, y\ k), 

s(x, y; k) = [ KL(x , y) * t(x , y)] jj|+n p (z, y)+n a/d (x, y), 

( 1 ) 

where t(x, y) represents the spatial response of the image- 
gathering device, n p (x, y) is the discrete photodetector noise 
n a / d is the analog-to-digital (A/D) conversion noise, and 
k represents the number of levels used for the A/D conver- 
sion. In the Fourier domain, the image-gathering process 
is defined as 

— [A^L(iy'.a;)f(u, u/)] * jjj (2) 

+h p {v,ui) +n a/d (v,*}-, k), 

where L( v,u>) is the continuous radiance-field transform. 
f(v, uj) is the SFR of the image-gathering device, n p ( v, cj) 
and h a /d{v , cj; k) are the discrete photodetector and analog- 
to-digital (A/D) conversion noise transforms, and 
are the spatial frequencies w ith units of cycles per sample. 
The tilde “T is used instead of the caret whenever the 
Fourier transformation is discrete and, therefore, the trans- 
formed function is periodic in the spatial frequency do- 
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Figure 1. Model of visual communication channel together with the critical limiting factors that constrain its performance. 
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Figure 2. SFRs T(u,(0) of the image-gathering device relative 

A 

to the sampling passband B for unit sampling intervals. 


main. The function ||| is the Fourier transform of the rect- 
angular sampling lattice with unit intervals, and is given 
by 

Jli = ^2 6(v - m,w - Ti) (3) 

m,n 

= S(v,u>) + ||[ e (t>,w), (4) 


where is the Dirac delta function and ||| accounts 

for the sampling sidebands. The associated sampling pass- 
band 


B = 


{v y u;)\\v\ < 


has unit area, i.e., \B\ = 1. The analog-to-digital transfor- 
mation is done for k levels with 77 -bit quantization, where 
7 } = log 2 K- The image-restoration process transforms 
this signal into the observed image R{x,y; k). The corre- 
sponding image-restoration process in the spatial frequency 
domain is defined as 


R(v,u,': k) = K l s{x, y\K)4>(v,u)\K) + N r (v,uj), (5) 

where 4 ' k) is a linear filter that records the digitally 
processed signal on an interpolation lattice that is suffi- 


ciently fine to suppress the blurring and raster effects of 
the image-display process and N t (v,uj) is the transform 
of the reconstruction noise (e.g., film granularity). 

To assess visual communication in terms of informa- 
tion theory, the image-gathering process is constrained to 
be linear and isoplanatic (spatially invariant), and the radiance- 
field and noise amplitudes are constrained to be Gaussian, 
wide-sense stationary, and statistically independent. In ad- 
dition, we characterize: (a) the radiance field L(x,y) by 
the power spectral density (PSD) 

= |2-|L(u,w)| 2 

of an isoplanatism patch of the scene with area |.4|, (b) the 
discrete signal s(x , y) prior to A/D conversion by the PSD 

$ s (i\w) = [/f 2 4 L (u,u;)|f(u,u;)| 2 ] *jj| + 4> p (>, w), (6) 

(c) the photodetector noise n p (x, y) by the PSD $ p (v, uj), 
and (d) the A/D conversion noise n a / rf (a : 7 y\ k) by the PSD 

$a/d(v,uj;K) = , (7) 


where 


’■-II 


4> s (u,u;) dvduj. 


Figures of merit 

By accounting for the critical constraints of image gather- 
ing, we can quantitatively assess visual communication in 
terms of the following figures of merit: 

1 . The rate of transmission of information, or informa- 
tion rate, l~i that the image-gathering system pro- 
duces for the radiance field that resides within its 




field of view, as given by 



4 L (t;,u;)|f(u,oj)| 2 


where 


dv cL>, 
( 8 ) 


$„{v,w,k) = $t(t>,u/)|r(u,(j)| 2 * jjj^ (9) 
+ K "[^ , p(w, w) + $ a /d( v , «)]• 


2. The maximum-realizable fidelity T of the digital im- 
age that can be restored from the received informa- 
tion, unconstrained by the image-display medium, 
as given by 

'DO 

T = ol 2 JJ * L (v,u) [l -2~' k(v '^dvdu, 

— OO 

( 1 °) 

where H(v,uj) is the spectral distribution of the in- 
formation rate Ti given by the integrand of Eq. 8. 

Figure 3 characterizes the information rate TL as a func- 
tion of the electro-optical design of the image-gathering 
device, as specified by the optical-design index p c and the 
root-mean-square (rms) signal-to-noise ratio (SNR) for a 
radiance field with mean spatial detail equal to the sam- 
pling interval. The curves show that the SFRs that maxi- 
mize information TL are a function of the SNR. This result 
is intuitively appealing for image restoration. In one ex- 
treme, when the SNR is low, one would prefer to avoid 
substantial blurring — SFR extends well beyond the sam- 
pling passband, hence, p c is large — because the noise con- 
strains the enhancement of fine spatial detail. In the other 
extreme, when the SNR is high, one would prefer to avoid 
substantial aliasing — SFR remains mostly inside the sam- 
pling passband, hence, p c is small — because then the noise 
no longer constrains this enhancement. 

Using the curves shown in Fig. 3, visual communica- 
tion channels can be specified, in terms of their SNR and 
the SFR of the image gathering device, that maximize the 
information throughput. Huck, et al 1 * 3 show that visual 
communication channels that are designed to maximize 
the information throughput also maximize the quality of 
the restored image in terms of sharpness and clarity of the 
fine detail. Table 1 lists the electro-optical designs spec- 
ified by an SNR and an SFR parameterized by the index 
p c that maximize information throughput. Conventional 
image gathering typically has an SFR with p c = 0.80. 


Figure 4 presents images that illustrate the transition 
from traditional telephotography and television in which 
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Figure 3. Information rate M versus optical-design index p c 
for several SNRs. 


Design 

SNR 

Pc 

1 

256 

0.30 

2 

64 

0.40 

3 

16 

0.60 


Table I: Channel designs that maximize information throughput. 


images are reproduced without digital processing to in- 
formationally optimized visual communication systems in 
which images are reproduced with digital restoration. 


Conclusion 

The image-gathering device that is designed to produce the 
maximum-realizable information rate maximizes both the 
quality ■ of the image restoration (i.e., the restorability of 
images for fidelity, resolution, sharpness, and clarity), and 
the robustness of the image restoration (i.e., the tolerance 
of the restoration to errors in estimates of the radiance- 
field statistics). This critical dependence of the quality and 
robustness of visual communication on the design of the 
image-gathering device is largely independent of the sta- 
tistical properties of natural scenes. 
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Abstract 

The multiscale retinex with color restoration (MSRCR) 
has shown itself to be a very versatile automatic image 
enhancement algorithm that simultaneously provides dy- 
namic range compression, color constancy, and color ren- 
dition. A number of algorithms exist that provide one 
or more of these features, but not all. In this paper we 
compare the performance of the MSRCR with techniques 
that are widely used for image enhancement. Specifically, 
we compare the MSRCR with color adjustment methods 
such as gamma correction and gain/offset application, his- 
togram modification techniques such as histogram equal- 
ization and manual histogram adjustment, and other more 
powerful techniques such as homomorphic filtering and 
‘burning and dodging’. The comparison is carried out by 
testing the suite of image enhancement methods on a set 
of diverse images. We find that though some of these 
techniques work well for some of these images, only the 
MSRCR performs universally well on the test set. 

Introduction 

The Multiscale Retinex 1 (MSR) is a generalization of the 
single-scale retinex 2 ” 4 (SSR), which, in turn, is based upon 
the last version of Land’s center/surround retinex 5 . The 
current version of the MSR combines the retinex dynamic 
range compression and color constancy with a color ‘restora- 
tion’ filter that provides excellent color rendition 6-8 . This 
version of the MSR is called the Multiscale Retinex with 
Color Restoration (MSRCR). The MSRCR has been tested 
with a very large suite of images and has consistently proven 
to be better than any conventional image enhancement tech- 
nique. In this paper we present a comparison of the MSRCR 
with several of the most popular image enhancement meth- 
ods. These include point transforms such as automatic 
gain/offset, non-linear gamma correction, non-linear in- 
tensity transforms such as the logarithmic transform or the 
‘square-root’ transform; and global transforms such as his- 
togram equalization 9 , homomorphic filtering 10 , and man- 
ual ‘burning and dodging.’ 


State-of-the-art Techniques 

In this section we briefly describe the characteristics of 
some of the state-of-the-art techniques most commonly used 
for image enhancement. 

Gain/offset correction 

One of the most common methods of enhancing an im- 
age is the application of a gain and an offset to stretch 
the dynamic range of an image. This is a linear operation 
and hence has limited success on scenes that encompass a 
much wider dynamic range than that that can be displayed. 
In this case, loss of detail occurs due to saturation and clip- 
ping as well as due to poor visibility in the darker regions 
of the image. For a scene with dynamic range between 
r max and r min , and a display medium with dynamic range 
d max , lbi s transform can be represented by 

/'(x, y) = ■ (I,{x,y) - r m ,n), (1) 

T max Tmin 

where U is the ith input band, and /' is the it h output band. 
This particular transform will transform the scene to com- 
pletely fill the dynamic range of the display medium. This 
does not imply, however, that this process will provide a 
good visual representation of the original scene. 

Non-linear Point Transforms 

Another well known method used for providing dynamic 
range compression is the application of non-linear trans- 
forms such as the gamma non-linearity, the logarithm func- 
tion, and the power-law function to the original image. 
These functions are typically biased toward increasing the 
‘visibility’ in the ‘dark’ regions by sacrificing the visibil- 
ity in the ‘bright’ areas. The output of such filters can be 
described by 

Tdx,y) = P[Ii(x,y)], (2) 

where P[] represents the point non-linearity. A typical 
point non-linearity is illustrated in Fig. 1. 
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(6) 

(7) 
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Figure I : A typical nonlinear point transform function. 


Histogram Equalization 

A global technique that works well for a wide variety of 
images is histogram equalization. This technique is based 
on the idea of remapping the histogram of the scene to a 
histogram that has a near-uniform probability density func- 
tion. This results in reassigning dark regions to brighter 
values and bright regions to darker values. Histogram equal- 
ization works well for scenes that have unimodal or weakly 
bi-modal histograms (i.e. very dark, or very bright), but 
not so well for those images with strongly bi-modal his- 
tograms (i.e. scenes that contain very dark and very bright 
regions). 

Homomorphic Filtering 

The technique that most resembles ours conceptually and 
functionally is homomorphic filtering 10 . The general idea 
of homomorphic filtering is shown in Fig. 2. The image 
is first passed through a logarithmic non-linearity that pro- 
vides dynamic range compression. It is then Fourier trans- 
formed, and its representation in the spatial frequency do- 
main is modified by applying a filter that provides con- 
trast enhancement. The modified image is then inverse 
Fourier transformed and is passes through an exponential 
non-linearity that ‘reverses’ the effects of the logarithmic 
nonlinearity.* Mathematically, 


s t {x,y) 

= ln[/i(x,j/)] 

(3) 

s\(v,u>) 


(4) 

*>,w) 

= ,w) 

(5) 


* A modified color version of the homomorphic filter was proposed by 
Faugeras^ in 1979. Our implementation simply applies the black and 
white version of the homomorphic filter to each band of the color image 
and combines the results to form a color output image. 


s'"(x,y) = T ‘[s''( u - w )] 

I[(x,y) = exp[s'"(x, t/)], 

where T[], and T~ l [) represent the Fourier and the in- 
verse Fourier transforms respectively, and H represents 
the homomorphic filter. It is in its final exponential trans- 
form that the homomorphic filter differs the most from the 
MSRCR. MSRCR does not apply a final inverse transform 
to go back to the original domain! 

Manual Image Enhancement 

As both professional and amateur photographers face the 
limitations of the narrow dynamic range in current print- 
ing technology, and the inadequate performance of image 
enhancement algorithms, more and more attention is being 
focused on manual enhancement methods. One such tech- 
nique is ‘burning-and-dodging’ where different regions of 
an image are interactively modified by a user*. The burn 
and dodge tool provides the capability of modifying the 
color content of a region by using tools of varying sizes 
and shapes that work as electronic “scrims.” 

Multiscale Retinex with Color Restora- 
tion 

The general form of the MSRCR can be summarized by 
the following equation: 

s 

T^Mi( x ^y) = y^u.’ s (log[/i(j ,3/)] - (8) 

5—1 

log [/; (x, 2/ ) * M s (x,y)}), i = 1, .... A r 

where TZ\i i is the ith band of the MSRCR output, S is the 
number of scales being used, w s is the weight of the scale, 
/; is the ith band of the input image,, and N is the number 
of bands in the input image. The surround function M s is 
defined by 

M s (x,y) = K exp [a 2 /(x 2 + y 2 )] , 

where a s is the standard deviation of the sth surround func- 
tion, and ff K exp \a 2 s j{x 2 + y 2 )] dxdy — 1. The num- 
ber of scales, 5, and the widths of the surround functions, 
o s , are image independent*. In other words, these have 
been chosen to maximize enhancement for a large§ num- 
ber of images. Once the constants have been selected, then 
the process is truly automatic and independent of the vari- 
ations in scene statistics. 

t Adobe Photoshop 4.0, a commercial photo manipulation software 
package, provides a bum and dodge tool. 

* Typically for 512 x 512 images. The o s may change with the di- 
mension of images. 

§We have not yet found an exception after having processed 1000+ 
images! 




Comparison 

We have compared the MSRCR with all of the image en- 
hancement techniques described above. We present the re- 
sults in Figs. 3, and 4. We present the comparison with 
manual burning and dodging separately. 

Point operations 

Figure 3 shows a collage of images that compares the out- 
put of the MSRCR with the point transforms. As can be 
seen, the MSRCR provided the best overall visual quality 
in each case. The techniques such as histogram equaliza- 
tion perform well for a wide range of scenes, but they also 
fail for a large set. The MSRCR outperforms the other 
methods universally. 

Homomorphic filtering 

Figure 4 shows a comparison of the MSRCR with homo- 
morphic filtering. The homomorphic filter consistently pro- 
vided excellent dynamic range compression but is lacking 
in final color rendition. The output of the homomorphic 
filter in effect appears extremely hazy compared with the 
output of the MSRCR though the dynamic range compres- 
sion of the two methods appears to be comparable. 

Manual Burning and Dodging 

Figure 5 shows a comparison of the MSRCR with the re- 
sults obtained by using manual burning and dodging. The 
manually processed image shows an improvement over the 
original as far as the information and detail in the dark ar- 
eas is concerned but it lacks the vividness and color satu- 
ration that the MSRCR image retains and even enhances. 
There is obvious streaking from the very local operation of 
the tool stroke — this could be eliminated but only at the ex- 
pense of adding considerably to the total processing time. 
In the high detail areas where there are sharp differences 
in reflectance, a tool with size approaching that of a single 
pixel would be required to bring out all the details. Since 
the time needed for enhancing a region is roughly in in- 
verse proportion to the size of the tool being used for the 
processing, this suggests that a very large amount of time 
would be needed to perform such an enhancement. On a 
scene-by-scene basis, the time and effort required for man- 
ual manipulation can be reasonable; but the MSRCR pro- 
duces images that are equivalent or better in quality at a 


fraction of the time. Because the visual quality of man- 
ual burning and dodging is solely limited by the patience 
and time commitment of the user, the case shown is per- 
haps typical of the performance achieved by the persistent 
non-specialist. 


Conclusions 


We have provided a brief description of the most com- 
monly used image enhancement techniques and compared 
their operation with the multiscale retinex with color restora- 
tion. We have shown that the MSRCR outperforms these 
techniques in all cases in terms of dynamic range compres- 
sion achieved, and the rendition of the final color image. 
The automatic nature of the process also enables us to use 
the same set of parameters ‘blindly’ for each and every im- 
age that is encountered. Of course, there are a few images 
for which the MSRCR has sub-par performance. But these 
are fairly rare and generally relate to defects in the orig- 
inal image data — such as preferential clipping of a spec- 
tral band. We are currently investigating methods to detect 
such scenes and adaptively adjust the MSRCR to correct 
for these sub-par performances. 
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Figure 3: A comparison of the MSRCR with point operations. Top row: original; second row: histogram equalization; third row: 
gain/offset; fourth row: gamma non-linearity; bottom row: MSRCR 




(a) Original (b) Homomorphic filter (c) MSRCR 

Figure 4: A comparison of the MSRCR with images enhanced by homomorphic filtering. The dynamic range compression achieved by 
the two methods is comparable, but the MSRCR produces images that possess much better contrast and sharper colors. 



(a) Original (b) Manual burning and dodging (c) MSRCR 

Figure 5: Comparison of the MSRCR with manual 'burning -and -dodging . ’ The manually enhanced image was produced using the 
burning and dodging tool provided in Adobe Photoshop 4.0. Circular tools with soft edges were used to modify the color content of 
different regions. The total time to produce this enhanced image was 20 minutes. The MSRCR image took 45 seconds on a PentiumPro 
200MHz machine. 




[7] Z. Rahman, D. Jobson, and G. A. Woodell, “Multiscale 
retinex for color rendition and dynamic range compres- 
sion,” in Applications of Digital Image Processing XIX 
(A. G. Tescher, ed.), Proc. SPIE 2847, 1996. 

[8] D. Jobson, Z. Rahman, and G. A. Woodell, “Retinex image 
processing: Improved fidelity for direct visual observation,” 
in Proceedings of the IS&T Fourth color Imaging Confer- 
ence: Color Science, Systems, and Applications, pp. 124- 
126, IS&T, 1996. 

[9] R. C. Gonzalez and P. Wintz, Digital Image Processing. 
Reading, MA: Addison-Wesley, second ed., 1987. 

[10] T. G. Stockham, Jr., “Image processing in the context of 
a visual model,” Proceedings of the IEEE , vol. 60, no. 7, 
pp. 828-842, 1972. 

[11] O. D. Faugeras, “Digital color image processing within the 
framwork of a human visual model,” IEEE Transactions on 
Acoustics , Speech and Signal Processing , vol. 27, pp. 380- 
393, Aug. 1979. 



An Information Theory of Visual 
Communication 


F. O. Huck, C. L. Fales and Z. Rahman 

Philosophical Transactions of the Royal Society A: 
Physical Sciences and Engineering 
(October 1996) 



al<So 



Philosophical 

Transactions: 

Mathematical, Physical 
and Engineering Sciences 


Volume 354 Pages 2193-2287 Number 1716 15 October 1996 


Philosophical Transactions of the Royal Society of London SERIES A 








THE ROYAL SOCIETY 

Philosophical Transactions: Mathematical, 
Physical and Engineering Sciences 


giniiniiigiiiiiiiiiiiiiiiiiiiiii 

1 364-503X(1 9961 0)354:1 71 6 


Series A Volume 354 Number 1716 15 October 1996 

CONTENTS 


F. O. Huck, C. L, Fales & Z. Rahman 

An information theory of visual communication 

C. L. Fales, F. O. Huck, R. Alter-Gartenberg & Z. Rahman 
Image gathering and digital restoration 


pages 2193-2248 


2249-2287 




This issue was produced by using the TgX typesetting system 


Published in Great Britain by the Royal Society, 6 Carlton House Terrace, London SW1Y SAG 
Printed in Great Britain for the Royal Society by the University Press, Cambridge 


m 

1‘tt* 

i>» 


m 

S'* 












;nces 


E6BT 


Cambridge 

thematics 
ology and 
on. Gower 
.ology and 


lences and 

■ss is given 
md figures 
repared in 
Hosoph ical 

6 Carlton 


publication 
permission 
he terms of 
naking of a 
iividual for 


onthly. The 
Cnion, £540 
t is included 
\iblications 
Y 5 AG, UK 
jred Charity 


An information theory of visual communication 

By Friedrich O. Huck 1 , Carl L. Fales 1 and Zia-ur Rahman 2 

1 NASA Langley Research Center , Hampton , VA 23681, USA 
2 Science and Technology Corporation , Hampton , VA 23666 , USA 


Contents 


1. 

Introduction 

PAGE 

2194 

2. 

Image gathering and reproduction 

2198 


(a) Image gathering 

2199 


( b ) Image reconstruction 

2202 


(c ) Image restoration 

2203 


( d ) Image enhancement 

2205 

3. 

Figures of merit 

2206 


(a) Information rate TL 

2206 


(fc) Theoretical minimum data rate £ 

2209 


(c) Information efficiency 7i/£ 

2210 


(d) Maximum-realizable fidelity T 

2210 


(e) Information rate 7i 0 

2212 


(/) Maximum-realizable fidelity T 0 

2213 

4. 

Multiresolution decomposition 

2213 


(a) Single-level transform 

2213 


( b ) Wavelet transform 

2218 

5. 

Quantitative and qualitative assessments 

2221 


(a) Simulation 

2221 


( b ) Image gathering and transmission 

2223 


(c) Image gathering and reproduction 

2228 


(d) Multiresolution decomposition 

2234 

6. 

Conclusions 

2236 


Appendix A. Electro-optical design 

2238 


Appendix B. Insufficient sampling 

2239 


Appendix C. Quantization 

2242 


Appendix D. Image restoration without interpolation 

2245 


References 

2246 


The fundamental problem of visual communication is that of producing the best 
possible picture at the lowest data rate. We address this problem by extending in- 
formation theory to the assessment of the visual communication channel as a whole, 
from image gathering to display. The extension unites two disciplines, the electro- 
optical design of image gathering and display devices and the digital processing for 
image coding and restoration. The mathematical development leads to several intu- 
itively attractive figures of merit for assessing the visual communication channel as 
a function of the critical limiting factors that constrain its performance. Multires- 
olution decomposition is included in the mathematical development to optimally 
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Properties and Performance 
of a Center/Surround Retinex 

Daniel J. Jobson, Zia-ur Rahman, Member ; IEEE , and Glenn A. Woodell 


Abstract — The last version of Land’s retinex model for human 
vision’s lightness and color constancy has been implemented and 
tested in image processing experiments. Previous research has es- 
tablished the mathematical foundations of Land’s retinex but has 
not subjected his lightness theory to extensive image processing 
experiments. We have sought to define a practical implementation 
of the retinex without particular concern for its validity as a 
model for human lightness and color perception. Here we describe 
the trade-off between rendition and dynamic range compression 
that is governed by the surround space constant. Further, unlike 
previous results, we find that the placement of the logarithmic 
function is important and produces best results when placed after 
the surround formation. Also unlike previous results, we find best 
rendition for a “canonical” gain/offset applied after the retinex 
operation. Various functional forms for the retinex surround are 
evaluated, and a Gaussian form found to perform better than the 
inverse square suggested by Land, images that violate the gray 
world assumptions (implicit to this retinex) are investigated to 
provide insight into cases where this retinex fails to produce a 
good rendition. 

I. Introduction 

O F THE MANY visual tasks accomplished so gracefully 
by human vision, one of the most fundamental and 
approachable for machine vision applications is lightness and 
color constancy. While a completely satisfactory definition is 
lacking, lightness and color constancy refer to the resilience 
of perceived color and lightness to spatial and spectral il- 
lumination variations. Various theories for this have been 
proposed and have a common mathematical foundation [1]. 
The last version of Land’s retinex [2] has captured our atten- 
tion because of the ease of implementation and manipulation 
of key variables, and because it does not have “unnatural” 
requirements for scene calibration. Likewise, the simplicity 
of the computation was appealing and initial experiments 
produced compelling results. This version of the retinex has 
been the subject of previous digital simulations that were 
limited because of lengthy computer time involved and was 
implemented in analog very large-scale integrated circuits 
(VLSI) to achieve real-time computation [3], [4]. Evidence 
that this retinex version is an optimal solution to the lightness 
problem has come from experiments posing Land’s Mondrian 
target, randomly arranged two-dimensional (2-D) gray patches. 
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(a) 



Fig. 1. Spatial form of the center/surround retinex operator, (a) 3-D repre- 
sentation (distorted to visualize surround), (b) Cross-section to illustrate wide 
weak surround. 


as a problem in linear optimization and a learning problem for 
back propagated artificial neural networks [5], [6]. 

The utility of a lightness-color constancy algorithm for 
machine vision is the simultaneous accomplishment of: 

1) dynamic range compression; 

2) color independence from the spectral distnbution of the 
scene illuminant; 

3) color and lightness rendition. 

Land’s center/surround retinex demonstrably achieves the 
first two, although Land emphasized primarily the color con- 
stancy properties. Well-known difficulties arise, though, for 
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Fig. 2. Demonstration of retinex color constancy and dynamic range compression (prior to optimizing rendition) for a Gaussian surround with small 
space constant (15 pixels). 


color and lightness rendition [1], [3], [6]. These consist of 
i) lightness and color “halo” artifacts that are especially 
prominent where large uniform regions abut to form a high 
contrast edge with “graying” in the large uniform zones in an 
image, and ii) global violations of the gray world assumption 
(e.g., an all-red scene) which result in a global “graying 
out” of the image. Clearly, the retinex (perhaps like human 
vision) functions best for highly diverse scenes and poorest 
for impoverished scenes. This is analogous to systems of 
simultaneous equations where a unique solution exists if and 
only if there are enough independent equations. 

The general form of the center/surround retinex (Fig. 1) is 
similar to the difference-of-Gaussian (DOG) function widely 
used in natural vision science to model both the receptive 
fields of individual neurons and perceptual processes. The only 
extensions required are i) to greatly enlarge and weaken the 
surround Gaussian (as determined by its space and amplitude 
constants), and ii) to include a logarithmic function to make 
subtractive inhibition into a shunting inhibition (i.e., arithmetic 
division). We have chosen a Gaussian surround form whereas 
Land opted for a 1/r 2 function [2] and Moore et ai [3] used a 
different exponential form. These will be compared in Section 
II. Mathematically, this takes the form 

Ri(x,y) = log U{x,y) — log [F(x,y) * Ii{x,y)] (1) 

where Ii(x,y) is the image distribution in the ith color 
spectral band, denotes the convolution operation, F(x.y) 
is the surround function, and R t (x, y) is the associated retinex 
output. 

This operation is performed on each spectral band to pro- 
duce Land's triplet values specifying color and lightness. It is 
readily apparent that color constancy (i.e., independence from 
single source illuminant spectral distribution) is reasonably 
complete since 

7,(x, y) = Si(x, y)ri(x, y) (2) 


where S l (x.y) is the spatial distribution of the source illu- 
mination and r t (x. //), the distribution of scene reflectances 
(integrated over the spectral band response), so that 


/?*Uvy) = log 


S t (x.y)rj[x.y) 

S l (x.y)r i (x.y) 


(3) 


where the bars denote the spatially weighted average value. 
As long as Si(x.y) ^ Si(x.y), then 

R,(x.y) ~ log (4) 

r,(x.y) 


The approximate relation is an equality for many cases and, 
for those cases where it is not strictly true, the reflectance ratio 
should dominate illumination variations. 

Color constancy is demonstrated (Fig. 2) for the extreme 
cases of blue skylight illumination, direct sunlight only, and 
tungsten illumination. Actual daylight illumination should fall 
arbitrarily somewhere between the first two cases. Film and 
electronic cameras without computational intervention or film 
selection would produce the top row of images. Dynamic range 
compression is also readily demonstrated (Fig. 2, right) with 
computer simulation. Here the original image data is multiplied 
by a hyperbolic tangent “shadow.” Again, cameras without 
computation produce the upper result (or with a change of 
f/stop or exposure would bring out the shadowed detail but 
at the expense of saturating the nonshadowed image zones). 
Strikingly, color balance is retained across the wide dynamic 
range encompassed and the highly nonlinear operation of the 
retinex. 

These two examples do, however, point to the difficulty of 
realizing satisfactory color rendition in contrast to the ease 
of achieving color constancy and dynamic range compres- 
sion. Taken together, this discussion indicates the exciting 
possibilities that motivated us to engage in more extensive 
investigation. 
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(c) (d) 

Fig. 3. Examples of serious photographic defects due to spectral and/or spatial illumination variations, (a) “Green” kitchen due to fluorescent illumination, 
(b) Sodium vapor illumination, (c) Tungsten indoors/daylight outdoors, (d) Obscured foreground. 


The need for dynamic range compression and color con- 
stancy, especially if both are accomplished simultaneously by 
a simple real-time algorithm, is well known to photographers. 
Discrepancies between the photographer’s perception through 
the viewfinder and the captured film image can be quite bizarre 
(Fig. 3), and require constant vigilance to avoid impossible 
lighting situations and to carefully select the appropriate film 
and processing for the illuminant’s spectral distribution. The 
fundamental limit [3] is recognized to be the film or cathode 
ray tube’s (CRT’s) narrow dynamic range and static spectral 
response. Print/display dynamic range constraints of 50 : 1 are, 
however, compatible with the magnitude of scene reflectance 


variations. Except for extreme cases (snow or lampblack) 
reflectance variations are only 20:1 [7] and often much 
less. Thus, even the extremes of reflectance of ss 50 : 1 are 
easily spanned by print/display media. Clearly illumination 
variations are the culprit which human visual perception has 
overcome by eye-brain computation. Electronic still cameras 
have an intrinsically high dynamic range (> 2000:1) [8] 
set by the detector array electronics, and an even higher 
dynamic range within the detector array proper, since the 
limiting factor is usually the preamplifier noise added in 
transferring image signals off-chip or digitization noise added 
subsequently. Therefore, at least for electronic still cameras. 
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Before surround After surround 


Fig. 4. Demonstration of improved rendition obtained applying the log response after surround formation = 80 pixels). 


we can conclude that sufficient dynamic range is available to 
retain the full variations of both illumination and reflectance 
in arbitrary scenes. So it is certainly reasonable to consider 
either analog [3] implementations of compression/constancy 
or digital implementation if the initial A/D conversion is done 
at 10-14 bits (b), rather than the usual 8 b. 

Recent advances in high-speed computing led us to re- 
consider both extensive digital simulations of the retinex 
and real-time digital implementations for practical use in 
future electronic camera systems. The hours of computer time 
previously reported [3] are now reduced to minutes and real- 
time implementations using specialized digital hardware such 
as digital signal processing (DSP) chips seem reasonable. 
In other words, the full image dynamic range is available 
from current electronic cameras, real-time computation is 
realizable, and the ultimate bottleneck is only at the first 
print/display. Obviously, there are image coding aspects to 
both dynamic range compression and color constancy. We 
will touch upon these briefly but concentrate primarily on the 
design of the algorithm to produce combined dynamic range 
compression/color constancy/color-lightness rendition. 

We have seen that the center/surround retinex is both color 
constant and capable of a high degree of dynamic range 
compression. It remains, then, to specify an implementation 
that produces satisfactory rendition and examine alternatives 
to determine if other design options are equally good or 
better. Because the retinex exchanges illumination variations 
for scene reflectance context dependency [9], scene content 
becomes a major issue especially when it deviates from 
regionally gray average values — the “gray world” assumption 
[1]. Therefore, testing with diverse scenes, including random 
ones, is important to pinpoint possible limits to the generality 
of this retinex. 

Initial image processing simulations revealed the following 
unresolved implementation issues: 

1) the placement of the log function; 

2) the functional form of the surround; 


3) the space constant for the surround; 

4) the treatment of the retinex triplets pnor to display. 
These will now be explored more comprehensively. The 

results of testing the optimized algorithm on diverse scenes 
will then be presented with special emphasis on “gray-world” 
violations. Finally, the relationship of the algorithm to neuro- 
physiology will be examined briefly. 

II. Issues 

A. Placement of Log Function 

Previous research [3], [6] has largely concluded that the 
logarithm can be taken before or after the formation of the 
surround. Processing schemes [3], [6], [10] adhering closely to 
natural vision science, i.e., an approximate log photoreceptor 
response, favor placing log response at the photodetection 
stage prior to any surround formation. Our preliminary testing 
of this produced rather disappointing results and prompted us 
to reopen this seemingly decided issue. Initial testing of the 
postsurround log produced encouraging results with much less 
emphatic artifacts. Mathematically, we have that 

Ri = log I{x, y) - log[7(x, y) * F(x. y)} (5) 

and 

R 2 = \ogI(x, y) - {[log/(x. y)] * F(x.y)} (6) 

are not equivalent. The discrete convolution [log I(x,y) * 
F(x, y)] is, in fact, equivalent to a weighted product of I(x . y ), 
whereas the second term in (5) is a weighted sum. This is 
closely related to the difference between the arithmetric mean 
and the geometric mean except that F(x< y) is selected so that 

JJ F(x,y) dxdy — 1 (7) 

which does not produce exactly the nth root of n numbers 
as the geometric mean would. Since the entire purpose of 
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log(r) 

Fig. 5. Comparison of three surround functions — inverse square, exponen- 
tial, and Gaussian, normalized to equal full-width half-max (FWHM) response. 
The log(r) scale is necessary for comparison purposes but does diminish the 
differences between the functions. A linear r scale (if it were graphically 
feasible) would show very dramatic differences. The space constants are c i = 
50 pixels, C 2 = 72 pixels, and C 3 = 60 pixels. 

the log operation is to produce a point by point ratio to a 
large regional mean value, (5) seems the desired form and 
our image processing experiments bear out this preference. A 
typical example is shown in Fig. 4. While the halo artifact 
for ( 6 ) can be diminished by manipulation of the gain and 
offset, this results in a significant desaturation of color. In 
other examples, more severe color distortions occur, which 
likewise cannot be removed by manipulation of the gain/offset. 
In addition, a shadow simulation indicates much less dynamic 
range compression for ( 6 ). Therefore, we have selected the (5) 
form for our testing and optimization. This form is also that 
given in Land’s original presentation [2], though he is quoted 
as feeling the two forms were equally useful in practice [ 6 ]. 


B. The Surround Function 

Land proposed an inverse square spatial surround 

F(x',y') = 1/r 2 (8) 


where 


r = x n + t/' 2 


which can be modified to be dependent on a space constant as 

FW) = TT(W <9) 

Moore et ai [3] examined an exponential ‘‘absolute value” 

F{x',y') = e -l r l/ C! (10) 

because it is an approximation to the spatial response of analog 

VLSI resistive networks, and Hurlbert [ 6 ] investigated the 
Gaussian: 

F(x', y') = e -r2/c 3 ( 11 ) 


because of its widespread use in natural and machine vision 
modeling. A cross section of these 2-D functions (Fig. 5) 
shows that for any particular choice of space constant, the 
inverse square rolls off very rapidly, but ultimately retains a 
higher response to quite distant image pixels than the expo- 
nential and Gaussian forms. At distant values, the exponential 
ultimately exceeds the Gaussian response, so that in general the 
inverse square is consistently more “global,” the exponential 
is less so, and the Gaussian is more distinctively “regional.” 

In initial tests, no space constant for the inverse square 
surround could be found that achieved reasonable dynamic 
range compression, i.e., adequate enhancement of shadowed 
detail. The best performance is shown in Fig. 6 . In contrast, 
both the exponential and Gaussian forms produced good 
dynamic range compression over a range of space constants. 
Because the Gaussian offered the most experimental flexibility 
(good performance over wider range of space constants), it 
was selected for this implementation. It is likely that the 
exponential is equally useful and this is clearly of importance 
for analog VLSI resistive network hardware implementations 
of retinex computations. 

C. Surround Space Constant 

While Land proposed the center/surround retinex with a 2—1 
pixel diameter for the center (perhaps in keeping with the 
widely known coarser spatial resolution of purely chromatic 
vision), a center of only 1 pixel is clearly demanded for 
general-purpose image processing. Only after segmentation 
into lightness and chromatic images can the purely chromatic 
images be made coarser. In contrast, the surround space con- 
stant cannot be so clearly defined. Land proposed an inverse 
square surround with a full width-half maximum (FWHM) 
of 40° of visual angle. This corresponds to FWHM of about 
270 visual pixels (assuming a visual pixel is % 0.0 15°). 
We examined the performance of the Gaussian surround over 
a wide range of space constants. Since previous research [ 6 ] 
found variations in the space constant with the spatial variation 
in shadow profiles, a particular concern is the question of 
an optimum space constant that gives good performance for 
diverse scenes and lighting conditions. 

The image sequence (Fig. 7) established a trade-off that has 
not been previously studied. In varying the space constant from 
small to large values, dynamic range compression is sacrificed 
for improved rendition. The middle of this range (50 < C 3 < 
100 pixels) represents a reasonable compromise, where shad- 
ows are fairly compensated and rendition achieves acceptable 
levels of image quality. This is qualitatively compatible with 
human visual perception in that the treatment of shadows is 
influenced by their spatial extent. Larger shadows tend to be 
more compensated (less dark) while smaller shadows appear 
less compensated (blacker and with less visible internal detail). 

While we are not concerned with defining a form of 
the retinex that accurately models human vision, we must 
ultimately compare performance to that of human perception 
in order to meet basic image quality requirements. Our intent, 
then, is to find a form of the retinex that is functionally 
equivalent to human visual perception. Since the performance 
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Inverse square Exponential Gaussian 



C,=50 C =72 C=60 

Fig. 6. Comparison of visual performance of three surround functions arranged from left to right in order of increasing dynamic range compression. 



Fig. 7. Trade-off between dynamic range compression and color rendition for the Gaussian surround. Small space constants produce excellent dynamic 
range compression, while large constants produce the best rendition. 


of human vision for complex natural images has not been 
comprehensively defined, we are left with purely subjective 
assessments of image quality. Since the retinex is, to some 
extent, compensating for lighting variations and approximating 
a “reflectance world,” there are two directions available for 
assessment. First is the psychophysical comparison between 
the human observation of the scene to the processed and 
displayed image. Second is the quantitative comparison of the 


processed/displayed image to the measured scene reflectance 
values. The latter approach is replete with problems since 
lighting variations are clearly not completely removed by 
human visual perception. If, however, we pursue additional 
computation to segment lightness and chromatic images, the 
chromatic images are likely to be measures of relative spectral 
reflectance ratios that can be compared with scene reflectances 
to establish a figure-of-merit. Here, we will rely only on the 
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Fig. 8. Schematic of a characteristic retinex histogram illustrating the final 
gain/offset selection applied uniformly to the three color subimages. 


first comparison, since we are examining the overall utility of 
the computation to enable electronic imagery to be as resilient 
as the human observer of the same scene and not lose or distort 
major semantic information that would have been obtained by 
direct observation. The second approach will, perhaps, become 
more important in scientific data analysis such as multispectral 
classification in remote sensing imaging. 

D. Treatment of Retinex Output Prior to Display 

During initial experiments, we were surprised to find a 
characteristic form for the histograms of diverse scenes after 
the retinex operation (Fig. 8). Exceptions were for severe 
violations of the “gray world" assumption, e.g., an all-red 
scene. These violations are explored in a subsequent section, so 
here we will examine a natural image with reasonable scene 
diversity. 

Land's proposal [2] of the center/surround retinex does 
not explicitly address the issue of a final treatment with the 
possible implication that none is necessary. On the other hand, 
Moore et al. [3] advocate the automatic gain/offset approach, 
whereby the triplet retinex values are adjusted by the absolute 
maximum and minimum found across all values in all the 
color bands. Our own empirically derived approach (Fig. 8) 
differs from either of these in that a constant gain/offset 
is selected for best color rendition. This results in actually 
clipping some of both the highest and lowest signal transitions. 
Little information is lost because the retinex output signals 
form, to a large degree, a contrast image (being in essence 
a ratio). This constant gain/offset has thus far proven to be 
independent of image/scene content. Our approach, otherwise, 
agrees with Moore et al. in that a final gain and offset is 
uniformly applied to all pixels in all three color bands. A 
comparison of these two approaches is illustrated (Fig. 9) to 
underline the considerable visual differences encountered. We 
speculate that the significant deviations from the characteristic 
histogram that occur for gross violations of the gray-world 
assumption could be used to detect errors. The gain/offset 


appears to be invariant from image to image, so that we 
have the sense that it is canonical and, therefore, satisfies the 
original intent of Land to produce a general computation that 
applies to most images. The term “canonical" refers to the 
post-retinex gain/offset being general constants that do not 
vary either from image-to-image or between band-to-band. 

E. Summary 

The specific implementation we have defined from prelimi- 
nary testing is a center/surround operation with the following 
characteristics: 

1) the spatial extent of the center is the individual pixel, 
which can be thought of as a small Gaussian defined by 
the optical blur function of the imaging optics; 

2) the form of the surround is Gaussian; 

3) The spatial extent of the surround is that for a Gaussian 
space constant of about 80 pixels (which corresponds to 
an FWHM spread of 210 pixels); 

4) the logarithm is applied after surround formation by 2-D 
spatial convolution; 

5) a “canonical" gain/offset is applied to the retinex output 
which, in signal terms, clips some of the highest and 
lowest signal excursions. The gain and offset are general 
constants that do not vary either from image to image 
or between color bands. 

Our implementation differs from previous ones in that Land 

[2] proposed an inverse square surround while Moore et al. 

[3] and Hurlbert [6] concentrated on placement of the log 
prior to surround formation (or else considered placement as 
interchangeable). Finally, Moore et al. specified an automatic 
gain/offset process rather than the canonical one used here. All 
of these differences were shown to result in significant visual 
effects on processed images. 

III. Results 

Because the mathematics, though simple, involve a non- 
linearity coupled to large-scale spatial interactions, the per- 
formance on complex images is not predictable. The only 
recourse is to apply the method to diverse images in hopes 
of exposing limitations and distortions. The performance on 
images not meeting the regional gray-world assumption is 
examined to attempt to define ways to detect and minimize 
or correct errors if they occur in some systematic fashion. It 
should be clear that while the dynamic range compression and 
color constancy are readily achievable, the goal of rendition 
poses a great challenge. 

Rendition is as difficult to define as it is to achieve. 
Our working definition is that rendition means producing a 
resultant displayed image that is convincingly like what a 
human observer would see when examining the same scene 
as the camera does. Therefore rendition means fidelity both 
to the scene and to human perception. This is by necessity 
a qualitative criterion because, while we can quantify the 
scene, the current state of color psychophysics does not 
provide the ability to quantify color perception in complex 
scenes. Our working criteria is to compare the original and 
processed images visually and, where possible, to compare the 
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auto gain/offset "canonical" gain/offset 

Fig. 9. Comparison of the visual performance of auto gain/offset versus “canonical” gain/offset. The auto gain/offset is selected on the absolute maximum 
and minimum values in all three color bands and applied uniformly to all three as a global operation. The “canonical” gain/offset accepts some clipping of 
extreme high and low values but provides superior rendition with minimal loss of visual information (c^ = 80). 




Retinex 

output 



Fig. 10. Results for diverse test images — stochastic and deterministic; computer-generated; natural; and false-color (C 3 = 80 pixels). 


processed image to the scene. While quantitative measures do 
exist for comparing input/output images, these do not capture 
the essential quality of visual significance. An abundance of 
psychophysical research underlines the central role of context 
in visual significance as well as the type of visual phenomena. 
We would like to admit and accept any distortion that the 
eye-brain does not find disturbing or perceptible. So we are 
left at this time with a reliance upon only visual perception 
in assessing the rendition in these experiments in retinex 
processing. From our own visual experience, we make the 
following statements about human visual perception: 


1) The dynamic range compression of shadows is related 
to the visual extent of the shadow. Larger shadows are 
more compressed than smaller shadows, i.e., the surfaces 
in larger shadows are lighter than those same surfaces 
in much smaller shadows. 

2) Lightness constancy seems less strong than color con- 
stancy. Hue and saturation of colors seem less af- 
fected by lighting variations, than absolute gray scale. 
In complex natural scenes the perception of color within 
shadows is not affected significantly, but the perception 
of lightness is. 
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Test scene Retinex (two scales) 



c ; =8G pixels c : =15 pixels 


Fig. 11. Results for a test scene with multiple color iiluminants and familiar test targets that allows comparison of retinex performance to direct human 
observation of the scene. Test targets on the left are predominantly in fluorescent illumination, while the middle set is in daylight, and the ones on the 
right are in tungsten light. For direct human observation, there is no shadow perceived at the top, no greenish overall tint observed (due to fluorescent 
illumination), and excellent color constancy of test targets. 


From psychophysical research [ 11 ], we add: 

3) Lightness constancy is primarily the preservation of rel- 
ative gray-level relationships even though the sensations 
of absolute lightness slide up and down to some extent 
with lighting variations. 

Therefore, we look for these same effects in the results of 
retinex processing. 

All the images used in subsequent testing are 512 x 512 
pixels, three spectral bands, with 8 b per band. The test scene 
image (Fig. 11) was acquired using Ektachrome slide film 
and then digitized using a high-resolurion slide scanner. All 
color prints were printed on a Kodak XLT7720 continuous- 
tone printer with 7 = 1.5 to compensate for the printer's 
nonlinear transfer function. 

We begin by showing a range of diverse images (Fig. 10) 
for which this retinex produces good results, and include a 
false color LANDS AT image (i.e., green, red, and infrared are 
spectrally translated to blue, green, and red). Also included 
is a computer generated stochastic image. This image is con- 
structed as a Poisson distribution of edges around a selectable 
mean spatial detail-parameter and intensity levels [ 12 ] that are 
Gaussian distributed in the three color bands. The case shown 
is for a high degree of spatial detail. The low signal values of 
the original LANDSAT image are accurately portrayed. 

While these results are encouraging and support the hy- 
pothesis that this retinex performs well on a wide array of 
images, we felt it necessary to go further and construct a 
test scene (Fig. 11) that combines mixed color iiluminants, 
variations, and familiar colors in multiple locations. This test 
allows us to compare the processed image to the scene and 
to an extent convey this comparison to the reader and for a 
case with visually severe defects. Our direct observation of 
this scene does not contain any sense of the shadow at the 
top of the image, and color constancy of the color charts and 
gray scales is almost complete. Likewise, direct observation 
contains no sense of the greenish tint that dominates the raw 
image and is due to the predominant fluorescent illumination. 


The optimized single scale retinex result falls short of human 
observation but succeeds in producing the correct beige scene 
color and some dynamic range compression of the shadow. 
The defects in the single scale retinex are the imperfect 
local color constancy (some of which is due to insufficient 
dynamic range in the raw image) and insufficient dynamic 
range compression of the shadow. A much smaller scale 
retinex (C 3 = 15 pixels) produces excellent dynamic range 
compression and local color constancy (to the limit of the 
original image). This suggests that the two scales produce 
complementary visual information and that a multiple-scale 
retinex should more closely approach the performance of 
human vision. This experiment dramatically convinced us of 
the importance of test scenes and comparison to direct human 
observation. Without that, we would have had no way of 
knowing that the prominent shadow was not really evident 
to the human observer or that local color constancy of the 
test targets was so perfect for human perception. The retinex 
processing seems capable of producing a rendition far closer 
to our direct observation than the unprocessed image. 

We also explored test images (Fig. 12) with zonal and 
global “gray-world” violations, i.e., spatially averaged relative 
spectral reflectance values are clearly not equal in the three 
color spectral bands. Mathematically, it is clear that errors are 
produced by retinex processing for these cases, but we wished 
to understand the visual impact of these errors for a variety 
of cases. The common thread in these retinex images is that 
“middle gray” is an error and transmits the message — “local 
equals regional context.” An intuitive remedy seems to be to 
expand “middle gray” regions to larger space constants and, ul- 
timately, to replace the log surrounds with the log of the global 
means (Fig. 12, bottom). This does correct for zonal gray- 
world violations but clearly not for global violations (Mars 
surface and green checkerboard images). The Mars surface 
image is especially instructive as a near-global gray-world 
violation. The correct color appears only at chromatic edges 
but not at lightness edges. This suggests the possible benefit of 
a chromatic/lightness segmentation and a “filling in” operation 
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Fig. 12, Results for images with noticeable artifacts due to zonal or global violations of the gray-world assumption and the substitution ot log global 
mean values for log surround to correct regional gray-world violations (C 3 = 80 pixels). 


[6] from chromatic edges only. The truly global gray-world 
violation of the green checkerboard image suggests that upon 
detection of no chromatic edges the processing should retreat 
to log image (equivalent to human color perception’s “aperture 
mode” [13]). Ultimately, we expect there to be a way to 
detect and correct these error cases. Since the retinex can 
most fundamentally be understood as exchanging illumination 
dependencies for contextual dependency, we anticipate that 
the solution to this problem lies in the analysis of the large 
zonal context at the scale of the surround function. This is 
obviously the central problem that limits the retinex’ s general 
application, but our results also indicate that (in the form we 
tested) the retinex performs well on a rather wide array of both 
natural and computer-generated images. 

We do feel, however, that the final treatment of the retinex 
triplets prior to display would benefit from some additional 
“tinkering/’ such as a fairly restrained nonlinear intensity 
transformation. The “canonical” gain/offset is surprising in 
view of the fact that the retinex output is proportional to the 
log of reflectance ratios. 

IV. Discussion 

Our findings raise several questions with respect to both 
natural and machine vision. Perhaps the most interesting is the 
placement of the log function after surround formation. This is 
completely contrary to the measured approximate logarithmic 
response of cone photoreceptors and the design of Mead’s 
silicon retina [10], which was based on those measurements. 


An examination of recent measurements of primate cones [14] 
reveals that, while the electrical probe is sampling a single 
cone, the cone is intact in a small patch of retina. Therefore, 
we wonder to what degree the measurements may reflect a 
“network” response rather than just the cone response. 

Another possible explanation is that other higher level 
nonlinear operations might serve to diminish or correct the 
emphatic halo artifact of the initial log response. The filling-in 
mechanism [6] is a possible candidate for this, and could also 
be responsible for correcting errors due to gray-world viola- 
tions. In any event, it seems reasonable to reconsider a linear 
photoresponse for machine vision applications especially in 
view of the wide dynamic range available from current charge- 
coupled device (CCD) detector arrays. Clearly for dynamic 
range compression, the log function must be applied prior 
to any significant bottleneck. For the retina, this bottleneck 
appears to be the ganglion cells that transmit from the retina 
to the lateral geniculate nucleus of the brain. 

It has not been possible to fully reconcile Land’s retinex 
with the neurophysiology of the primate retina. Receptive 
fields are invariably found to be spectrally opponent. Math- 
ematically (and regardless of log placement), this is clearly 
not color constant since the spectral ratios of the illumi- 
nant variables do not cancel. Land [15] proposed that linear 
transformations, much like television’s red-green-blue (RGB) 
to hue-saturation-value (HSV), were a workable resolution 
which, in combination with his center/surround, does result in 
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a color constant system as 

Rgr = log S G r G - log S R r R (12) 

Rr G = log S R r R - log S G r G (13) 

where R gr is the “green minus red" spectrally opponent 
retinex, and R rg is the “red minus green" opponent form. 
These can be combined to form 


Rgr = Rgr + Rrg (14) 

= log S G r G + log S R r R - log S R r R - log S G r G 

(15) 


where R gr is the double opponent “green minus red" retinex, 
which is color constant because 


R 


DO 

GR 


* r G r R 
= log ^-z— 
r G r R 


(16) 


when Si = S t for the ith spectral channel. 

Likewise, for a blue yellow double-opponent form: 

R B y = log S B rB - log SyTy (17) 

where Rby is a “blue minus yellow" spectral opponency 
retinex, and 


Syry = c\S R r R + c 2 S G r G (18) 


and ci, c 2 are weighting constants. Note that the placement of 
the log is important here, since 


log Syry ^ log CiS R r R + log c 2 S G r G . (19) 


Analogously, for the “green minus red" case 


R 


DO 

BY 


= log 


f B fy 


( 20 ) 


where Rgy is the double opponent “blue minus yellow" 
retinex, which is likewise color constant. For the color constant 
lightness channel, the lightness “center," Ri is given by 


R l = log c 3 S B r B + log c 4 S G r G + log c 3 S R r R (21) 


and the lightness surround, Ri , is given by 

R l = log c 3 S B r B + log c 4 S G r G + log c 5 S R r R . (22) 


Again, log placement is important, since 

log S B r B ± log S B r B (23) 


which leads to a color constant lightness center/surround 
retinex as 


Rl ~ Rl = log 


r B r G r R 

r B r G f R 


(24) 


Previous work [3], [6], [15] indicates that perceptual color 
constancy is consistent with noncolor constant early vision 
signals up to and perhaps including the striate cortex with 
the first clearcut evidence of color constancy in V4 cortex 


(downstream in the processing pathways from the striate cortex 
but prior to the full perceptual constructs, which appear to 
occur in the inferotemporal and parietal-cortices). 

On the whole, we are impressed by the performance of this 
retinex on wide ranging natural and test images even with 
the shortcomings of the gray-world assumption that show up 
as a significant perceptual distortion in certain of our test 
images. We are encouraged that these “error” cases appear 
to be detectable, and therefore may be minimized or corrected 
by some simple extension of this retinex. We feel that this 
extension can be based upon the fundamental mechanism of 
the retinex, which is to exchange illumination variations for 
context relationships and is likely to require a multiple scale 
approach. 

While we have not yet explored the relationship of retinex 
operations to image coding schemes, there is certainly an 
important connection. To the extent that a retinex operation 
is a general “front-end” computation for implementation in 
cameras, the retinex outputs become the inputs for image 
coding. The dynamic range compression aspect of the retinex 
does restrict signal variances as well as preserving scene 
information that would otherwise be lost to saturation or dark 
clipping. Dynamic range compression has been found to be 
broadly beneficial [12] for image coding. 

V. Conclusions 

In the course of defining a specific form for the cen- 
ter/surround retinex, we encountered several fundamental is- 
sues that had not been fully resolved by previous investiga- 
tions. These were: 

1) placement of the log function; 

2) functional form of the surround; 

3) size of the surround space constant; 

4) treatment of the retinex outputs pnor to final display. 

The examination of these issues with experiment image 

processing led us to define a specific retinex that is different 
from previous versions. Our version consists of: 

1) placement of the log function after surround formation: 

2) use of the Gaussian form for the surround (although an 
exponential form is also a good choice); 

3) a space constant of about 80 pixels as a reasonable 
compromise between dynamic range compression and 
rendition. (Better rendition can be achieved with even 
larger space constants, but at the expense of detail in 
shadow zones. This trade-off between compression and 
rendition is a property of the retinex); 

4) a “canonical” gain/offset for the final treatment of retinex 
output signals. 

It remains to generalize the retinex processing to handle 
gray-world violations and refine the final treatment of the 
retinex outputs. Even so, we are encouraged by the overall 
performance of this retinex — that it combines dynamic range 
compression, color constancy, and lightness/color rendition. 
The trade-off between dynamic range compression and color 
rendition, that is governed by the surround space constant, sug- 
gests a multiscale approach to generalizing retinex processing. 



An implementation in analog VLSI or digital VLSI computer 
chips is an exciting possibility for realizing ‘'smart” cameras 
of the future. 
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Abstract — Direct observation and recorded color images of 
the same scenes are often strikingly different because human 
visual perception computes the conscious representation with 
vivid color and detail in shadows, and with resistance to spectral 
shifts in the scene illuminant. A computation for color images 
that approaches fidelity to scene observation must combine dy- 
namic range compression, color consistency — a computational 
analog for human vision color constancy — and color and lightness 
tonal rendition. In this paper, we extend a previously designed 
single-scale center/surround retinex to a multiscale version that 
achieves simultaneous dynamic range compression/color consis- 
tency/lightness rendition. This extension fails to produce good 
color rendition for a class of images that contain violations of 
the grav-world assumption implicit to the theoretical foundation 
of the retinex. Therefore, we define a method of color restoration 
that corrects for this deficiency at the cost of a modest dilution 
in color consistency. Extensive testing of the multiscale retinex 
with color restoration on several test scenes and over a hundred 
images did not reveal any pathological behavior. 


1. Introduction 

A COMMON (and often serious) discrepancy exists be- 
tween recorded color images and the direct observation 
of scenes (see Fig. 1 ). Human perception excels at constructing 
a visual representation with vivid color and detail across the 
wide ranging photometric levels due to lighting variations. In 
addition, human vision computes color so as to be relatively 
independent of spectral variations in illumination [1]; i.e., it is 
color constant. The recorded images of film and electronic 
cameras suffer, by comparison, from a loss in clarity of 
detail and color as light levels drop within shadows, or 
as distance from a lighting source increases. Likewise, the 
appearance of color in recorded images is strongly influenced 
by spectral shifts in the scene illuminant. We refer to the 
computational analog to human vision color constancy as color 
consistency. When the dynamic range of a scene exceeds 
the dynamic range of the recording medium, there is an 
irrevocable loss of visual information at the extremes of 
the scene dynamic range. Therefore, improved fidelity of 
color images to human observation demands i) a computation 
that synthetically combines dynamic range compression, color 
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consistency, and color and lightness rendition, and ii) wide 
dynamic range color imaging systems. The multiscale retinex 
(MSR) approaches the first of these goals. The design of 
the computation is tailored to visual perception by comparing 
the measured photometry of scenes with the performance of 
visual perception. This provides a rough quantitative measure 
of human vision’s dynamic range compression — approaching 
1000 : 1 for strong illumination variations of bright sun to deep 
shade. 

The idea of the retinex w>as conceived by Land [2] as a 
model of the lightness and color perception of human vision. 
Through the years. Land evolved the concept from a random 
walk computation [3] to its last form as a center/surround 
spatially opponent operation [4]. which is related to the 
neurophysiological functions of individual neurons in the 
primate retina, lateral geniculate nucleus, and cerebral cortex. 
Subsequently, Hurlbert [5]-[7] studied the properties of this 
form of retinex and other lightness theories and found that they 
share a common mathematical foundation but cannot actually 
compute reflectance for arbitrary scenes. Certain scenes violate 
the “gray-world” assumption — the requirement that the aver- 
age reflectances in the surround be equal in the three spectral 
color bands. For example, scenes that are dominated by one 
color — “monochromes” — clearly violate this assumption and 
are forced to be gray by the retinex computation. Hurlbert 
further studied the lightness problem as a learning problem 
for artificial neural networks and found that the solution had 
a center/surround spatial form. This suggests the possibility 
that the spatial opponency of the center/surround is, in some 
sense, a general solution to estimating relative reflectances 
for arbitrary lighting conditions. At the same time, it is 
equally clear that human vision does not determine relative 
reflectance, but rather a context-dependent relative reflectance 
since the same surfaces in shadow and light do not appear 
to be the same. Moore et al [8], [9] took up the retinex 
problem as a natural implementation for analog very large 
scale integration (VLSI) resistive networks and found that 
color rendition was dependent on scene content — w'hereas 
some scenes worked well, others did not. These studies also 
pointed out the problems that occur due to color Mach bands 
and the graying-out of large uniform zones of color. 

We have previously defined a single-scale retinex [10] 
(SSR) that can either provide dynamic range compression 
(small scale), or tonal rendition (large scale), but not both 
simultaneously. The multiscale retinex with color restoration 
(MSRCR) combines the dynamic range compression of the 
small-scale retinex and the tonal rendition of the large scale 
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Fig. 1. Illustration of the discrepancy between color images and and perception. The right image is a much closer representation of the visual 
impression of the scene. 


retinex with a universally applied color restoration. This color 
restoration is necessary to overcome the problems that the 
MSR has in the rendition of scenes that contain gray-world 
violations. It merges all the necessary ingredients to approx- 
imate the performance of human vision with a computation 
that is quite automatic and reasonably simple. These attributes 
make the MSRCR attractive for smart camera applications, 
in particular for wide dynamic range color imaging systems. 
For more conventional applications, the MSRCR is useful 
for enhancing 8-b color images that suffer from lighting 
deficiencies commonly encountered in architectural interiors 
and exteriors, landscapes, and nonstudio portraiture. 

Most of the emphasis in previous studies has been on the 
color constancy property of the retinex, but its dynamic range 
compression is visually even more dramatic. Since we want to 
design the retinex to perform in a functionally similar manner 
to human visual perception, we begin with a comparison of 
the photometry of scenes to their perception. This defines (at 
least in some gross sense) the performance goal for the retinex 
dynamic range compression. 

An apparent paradox has been brought to our attention by a 
colleague as well as a reviewer. This paradox is so fundamental 
that it requires careful consideration before proceeding. The 
question, simply stated, is why should recorded images need 
dynamic range compression, since the compression of visual 
perception will be performed when the recorded image is 
observed? First we must state categorically that recorded 
images with significant shadows and lighting variations do 
need compression. This has been our experience in comparing 
the perception of recorded images with direct observation for 
numerous scenes. Therefore, we have to conclude that the 
dynamic range compression for perception of the recorded 
images is substantially weaker than for the scene itself. Fig. 1 
is a case in point. There is no linear representation of this 


image, such as the viewing of the image on a gamma-corrected 
cathode ray tube (CRT) display, which even comes close to 
the dynamic compression occurring during scene observa- 
tion. The same is true for all scenes we have studied with 
major lighting variations. We offer the possible explanation 
that weak dynamic range compression can result from the 
major differences in angular extent between scene and image 
viewing. Image frames are typically about 40° in angular 
extent for a 50 mm film camera. These same frames are 
usually viewed with about a 10° display or photographic 
print. Furthermore, the original 40° frame is taken out of 
the larger context, which would be present when observing 
the scene directly. The dynamic range compression of human 
vision is strongly dependent upon the angular extent of visual 
phenomena. Specifically, compression is much stronger for 
large shadow zones than for smaller ones. We feel that this 
a plausible resolution for this apparent paradox, and are 
certainly convinced by considerable experience that recorded 
images do need computational dynamic range compression for 
scenes that contain significant lighting variations. Likewise, 
this explanation applies to color consistency. 

Since the nonlinear nature of the MSR makes it almost 
impossible to prove its generality, we provide the results of 
processing many test images as a measure of confidence in 
its general utility and efficacy. Results obtained with test 
scenes — i.e., where direct observation of the subject of the 
image is possible — are given more weight because the per- 
formance of the computation can be compared directly to 
observation of the scene. 

II. The Photometry of Scenes Compared to Perception 

We approached learning more about the dynamic range 
compression in human vision by exploring the perceptual and 
photometric limits. We did this by selecting and measuring 
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TABLE I 

Photometry of Scenes 


Visual saturation (white clouds near sun) 

cd/m 2 

49,000 

Just below saturation (clouds further from sun) 

18,000-37,000 

Outdoor building facade — bright sun 

7000-13,000 

Blue sky — morning 

4600 

Concrete sidewalk in 

sun 

3200 

shadow 

570 

deep shadow 

290 

Interior conference room-fluorescent lighting 

Floor/ walls 

36-140 

Shadows 

4-18 

Interior conference room-unlit but by window 

Walls 

29 

Shadows 

6 

Inside open closet 

1 


scenes with increasingly emphatic lighting variations and then 
examining the point at which dynamic range compression 
gives way to loss of visual information. In other words, 
we looked for the dynamic range extremes at which human 
vision either saturates or clips the signals from very dark 
zones in a scene. We used a photographic spotmeter for 
the photometric measurements. In addition, we attempted to 
calibrate the perceptual lightness difference that occurs when 
the same surface is viewed in direct sunlight and in shadow. To 
quantify this difference, we compared the perceived lightness 
under both conditions to a reference grav-scale in direct sun 
and asked the question: Which gray scales match the surface in 
sun and shadow? Whereas the extreme measurements provide 
information about where dynamic range compression becomes 
lossy, the sun/shadow/gray-scale matches give some measure 
of the dynamic range compression taking place within more 
restricted lighting changes. 

The results of the photometric measurements are given in 
Table I. The conditions shown are representative of the wide 
dynamic range encountered in many everyday scenes. Scene 
visibility is good except under the most extreme lighting 
conditions. On the low- end, visibility is quite poor at 1 
candles/m 2 (cd/m 2 ) luminance but improves rapidly as light 
levels approach 10 cd/m 2 . Detail and color are quite easily 
visible across the range of 10-10000 cd/m 2 , even when all 
occur together in a scene. We can therefore conclude that dy- 
namic range compression within a scene can approach 1000 : 1, 
but becomes lossy for wider ranges. For low luminance, color 
and detail are perceptually hazy with a loss of clarity; and 
for extremely low levels of luminance (approaching 10000: 1 
when compared with direct sunlight), all perception of color 
and detail is lost. 

We can also quantitatively estimate from this data the 
difference between perception and photometry for a very 
commonly encountered case: objects in sun and shadow. 
The drop in light level usually associated with a shadow 
is between 10-20% of the sunlit value, depending on the 
depth of the shadow. We compared the perceived drop in 


lightness to a reflectance gray-scale and concluded that the 
perceptual decrease is only about 50% of the sunlit lightness 
value. This clearly demonstrates the large discrepancy between 
recorded images and perception, even for conditions that do 
not encompass a very wide dynamic range. This data implies 
that for 10:1 changes in lighting, the perception of these 
changes is about 3-5 : 1 to minimize the impact of lighting 
on the scene representations formed by consciousness. Hence, 
as simple and ubiquitous an event as a shadow immediately 
introduces a major discrepancy between recorded images and 
visual perception of the same scene. This sets a performance 
goal derived from human visual perception with which to test 
the retinex. Clearly, a very strong nonlinearity exists in human 
vision, although our experiments can not define the exact form 
of this neural computation. 

III. Construction of a Multiscale 
Center/Surround Retinex 

The single-scale retinex [ 1 0]— [ 1 2] is given by 

R,{x.y) = log li(x.y) - log [F(x.j/) * /,(z,2/)] (1) 

where R t (x, y) is the retinex output, I z {x . y) is the image dis- 
tribution in the zth spectral band, denotes the convolution 
operation, and F(x.y) is the surround function 

F(x.y) = Kt- r3/c * 

where c is the Gaussian surround space constant, and K is 
selected such that 

F(x, y) dx dy = 1. 

The MSR output is then simply a weighted sum of the outputs 
of several different SSR outputs. Mathematically. 

N 

R\lSR t ~ '^W ri Rn i (2) 

n = l 

where N is the number of scales. R n> is the zth component 
of the nth scale, R msr, is the zth spectral component of the 
MSR output, and w n is the weight associated with the nth 
scale. The only difference between R(x.y) and R n (x.y) is 
that the surround function is now given by 

F n (: T.y) = Ke- r "^. 

A new set of design issues emerges for the design of the 
MSR in addition to those for the SSR [10]. This has primarily 
to do with the number of scales to be used for a given 
application, and how these realizations at different scales 
should be combined. Because experimentation is our only 
guide in resolving these issues, we conducted a series of tests 
starting with only two scales and adding further scales as 
needed. After experimenting with one small scale (c n < 20) 
and one large scale (c n > 200). the need for a third interme- 
diate scale was immediately apparent in order to produce a 
graceful rendition without visible “halo” artifacts near strong 
edges. Experimentation showed that equal weighting of the 
scales — w n = 1/3, n = 1,2,3 — was sufficient for most 
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15 pixels 



250 pixels 



Multiscale 


Fig. 2. Components of the multiscale retinex that show their complementary information content. The smallest scale is strong on detail and dynamic 
range compression and weak on tonal and color rendition. The reverse is true for the largest spatial scale. The multiscale retinex combines the strengths 
of each scale and mitigates the weaknesses of each. 


applications. Weighting the smallest scale heavily to achieve 
the strongest dynamic range compression in the rendition leads 
to ungraceful edge artifacts and some graying of uniform color 
zones. 

To test whether the dynamic range compression of the MSR 
approaches that of human vision, we used test scenes that we 
had observed in addition to test images that we had obtained 
from other test sources. The former allowed us to readily 
compare the processed image to the direct observation of 
the scene. Fig. 2 illustrates the complementary strengths and 
weaknesses of each scale taken separately and the strength 
of the multiscale synthesis. This image is representative of a 
number of test scenes (see Fig. 3) where for conciseness we 
show only the multiscale result. 

The comparison of the unprocessed images to the perception 
of the scene produced some striking and unexpected results. 
When direct viewing was compared with the recorded image, 
the details and color were far more vivid for direct viewing 
not only in shadowed regions, but also in the bright zones 


of the scene! This suggests that human vision is doing even 
more image enhancement than just strong dynamic range 
compression, and the MSR may ultimately need to be modified 
to capture the realism of direct viewing. Initially, we tackle the 
dynamic range compression, color consistency, and tonal/color 
rendition problems, while keeping in mind that further work 
may be necessary to achieve full realism. 

A sample of image data for surfaces in both sun and shadow 
indicates a dynamic range compression of 2 : 1 for the MSR 
compared to the 3-5 : 1 measured in our perceptual tests. 
For the SSR (ci = 80) this value is 1.5: 1 or less. These 
levels of dynamic range compression are for outdoor scenes 
where shadows have large spatial extent. Shadows of small 
spatial extent tend to appear “darker” and are more likely to 
be clipped in recorded images. Fig. 3 shows a high dynamic 
range indoor/outdoor scene. The foreground orange book on 
the gray-scale is compressed by approximately 5:1 for the 
MSR while compression for the SSR is only about 3:1, 
both relative to the bright building facade in the background. 
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Fig. 3. Examples of test scenes processed with the multiscale retinex prior to color restoration. While color rendition of the left image is good, the 
other two are '*grayed ,, to some extent. Dynamic range compression and tonal rendition are good for all and compare well with scene observation Top 
row: Original. Bottom row: Multiscale retinex. 


The compression for human vision is difficult to estimate 
in this case, since both the color and texture of the two 
surfaces are quite different. Our impression from this analysis 
is that the MSR is approaching human vision's performance 
in dynamic range compression but not quite achieving it. For 
scenes with even greater lighting dynamics than these, we 
can anticipate an even higher compression for the MSR to 
match human vision. However, we are currently unable to 
test this hypothesis because the conventional 8-b analog-to- 
digital converters of both our solid-state camera and slide 
film/optical scanner digitizer restrict the dynamic range with 
which the image data for such scenes can be acquired. Solid 
state cameras with 12-b dynamic range and thermoelectric ally 
cooled detector arrays with 14-b dynamic range are, however, 
commercially available, and can be used for examining the 
MSR performance on the wider dynamic range natural scenes. 
Even for the restricted dynamic range shown in Fig. 3 (left), 
it is obvious that limiting noise has been reached, and that 
much wider dynamic range image acquisition is essential for 
realizing a sensor/processing system capable of approximating 
human color vision. 

For the conventional 8-b digital image range, the MSR 
performs well in terms of dynamic range compression, but its 
performance on the pathological classes of images examined 
in previous SSR research [10] must still be examined. Fig. 4 
shows a set of images that contain a variety of regional and 


global gray-world violations. The MSR, as expected, fails 
to handle them effectively — all images possessing notable, 
and often serious, defects in color rendition (see Fig. 4. 
middle row). We only provide these results as a baseline for 
comparison with the color restoration scheme, presented in the 
next section, that overcomes these deficiencies of the MSR. 


IV. A Color Restoration Method 
for the Multiscale Retinex 

The general effect of retinex processing on images wuth 
regional or global gray-world violations is a “graying out” 
of the image, either globally or in specific regions. This 
desaturation of color can, in some cases, be severe (see 
Fig. 4, middle). More rarely, the gray-world violations can 
simply produce an unexpected color distortion (see Fig. 4, 
top left). Therefore, we consider a color restoration scheme 
that provides good color rendition for images that contain 
gray-world violations. We, of course, require the restoration to 
preserve a reasonable degree of color consistency, since that 
is one of the prime objectives of the retinex. Color constancy 
is known to be imperfect in human visual perception, so 
some level of illuminant color dependency is acceptable, 
provided it is much lower than the physical spectrophotometric 
variations. Ultimately, this is a matter of image quality, and 
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Fig, 4. Pathological “gray-world” violations are not handled well by the multiscale retinex alone (middle row), but are treated successfully when color 
restoration is added (lower row). Top row: Original. 


color dependency is tolerable to the extent that the visual 
defect is not visually too strong. 

We begin by considering a simple colorimetric transform 
[13], even though it is often considered to be in direct 
opposition to color constancy models. It is also felt to describe 
only the so-called “aperture mode” of color perception, i.e., 
restricted to the perception of color lights rather than color 
surfaces [14]. The reason for this choice is simply that it 
is a method for creating a relative color space, and in so 
doing becomes less dependent than raw spectrophotometry 
on illuminant spectral distributions. This starting point is 
analogous to the computation of chromaticity coordinates 
where 


5 

I[(x,y) = Ii{x,y)/^Ii{x,y) (3) 

1=1 

for the ith color band, and 5 is the number of spectral channels. 
Generally, 5 = 3, using the red-green-blue (RGB) color 
space. The modified MSR that results is given by 

^MSRCR t (x<y) = C t (x,y)R\iSR t (x,y) (4) 


where 


Ci(x.y) = /[/'(x.y )] 


is the ith band of the color restoration function (CRF) in the 
chromaticity space, and /Zmsrcr, is the ah spectral band 
of the multiscale retinex with color restoration. In a purely 
empirical manner, we tried several linear and nonlinear color 
restoration functions on a range of test images. The function 
that provided the best overall color restoration was 


C,(x. y) = 0 log[a/'(x,?/)] 


= 0 


l 0 g[Q/,(x, 1 /)] - log 


‘ S 


(5) 


f Li=l J J 

where (3 is a gain constant, and a controls the strength of 
the nonlinearity. In the spirit of a preserving a canonical 
computation, we determined that a single set of values for 
(3 and a worked for all spectral channels. The final MSRCR 
output is obtained by using a “canonical” gain/offset to transi- 
tion between the logarithmic domain and the display domain. 
Looking at the forms of the CRF of (5) and the SSR of 
(1), we conjecture that the CRF represents a spectral analog 
to the spatial retinex. This mathematical and philosophical 
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TABLE II 

List of Constants Used for One Particular Implementation of the 
MSRCR on a DEC Alpha 3000, Using the VMS F77 Compiler 


Constant 

N 

Cl 

c 2 

C3 

G 

6 

<x P w n 

Value 

3 

15 

80 

250 

192 

-30 

125 46 1/3 


symmetry is intriguing, since it suggests that there may be a 
unifying principle at work. Both computations are nonlinear, 
contextual, and highly relative. We can speculate that the 
visual representation of wide dynamic range scenes must be a 
compressed mesh of contextual relationships for lightness and 
color representation. This sort of information representation 
would certainly be expected at more abstract levels of visual 
processing such as form information composed of edges, links, 
and the like, but is surprising for a representation so closely 
related to the raw image. Perhaps in some way this front-end 
computation can serve later stages in a presumed hierarchy of 
machine vision operations that would ultimately need to be 
capable of such elusive goals as resilient object recognition. 

The bottom row in Fig. 4 shows the results of applying the 
CRF to the MSR output for pathological images. The MSRCR 
provides the necessary' color restoration, eliminating the color 
distortions and gray zones evident in the MSR output. The 
challenge now' is to prove the generality of this computation. 
Since there is not a mathematical way to do this, we have 
tested the computation on several hundred highly diverse 
images without discovering exceptions. Unfortunately, space 
considerations allow us to present only a very small subset of 
all the images that we have tested. 

V. Selected Results for Diverse Test Cases 

Extensive testing indicates that the gain constant a for 
the CRF and the final gain/offset adjustment required to 
transition from the logarithmic to the display domain are 
independent of the spectral channel and the image content. 
This implies that the method is general or "canonical,” and 
can be applied automatically to most (if not all) images 
without either interactive adjustments by humans or internal 
adjustments such as an auto-gain. This final version of the 
MSRCR can then be written as 

-Rmsrcr, (x, y ) = G[Ci{x, y){ log /,(x. y ) 

-log [Ii(x,y)* F n (x,y)]} + b\ (6) 

where G and b are the final gain and offset values, respec- 
tively. The constants G and b intrinsically depend upon the 
implementation of the algorithm in software. Table II gives 
a list of the constants used to produce all the outputs in this 
paper. 

We must again emphasize that the choice of the all constants 
merely represents a particular implementation that works well 
for a wide variety of images. In no way do we mean to imply 
that these constants are optimal or “best case" for all possible 
implementations of this algorithm. The choice of the surround 
space constants, c„s, in particular does not seem to be critical. 
Instead, the choice seems to only need to provide reasonable 
coverage from local to near global. Likewise, the choice of us- 
ing three scales was made empirically to provide the minimum 
number of scales necessary for acceptable performance. 


The test images presented here begin with some test scenes 
since we feel it is fundamental to refer the processed images 
back to the direct observation of scenes. This is necessary to 
establish how well the computation represents an observation. 
Clearly, we cannot duplicate human vision's peripheral vision 
which spans almost 180°, but within the narrower angle 
of most image frames, we would like to demonstrate that 
the computation achieves the clarity of color and detail in 
shadows, reasonable color constancy and lightness and color 
rendition that is present in direct observation of scenes. The 
test scenes (see Fig. 5) compare the degree with which the 
MSRCR approaches human visual performance. All four of 
the MSRCR outputs shown in Fig. 5 are quite “true to life" 
compared to direct observation, except for the leftmost, which 
seems to require even more compression to duplicate scene 
perception. This image was scanned from a slide and digitized 
to 8-b/color. The other three images were taken with a Kodak 
DCS200C CCD detector array camera. In none of the cases 
could a gamma correction produce a result consistent with 
direct observation. Therefore, we conclude that the MSRCR 
is not correcting simply for a CRT display nonlinearity, and 
that far stronger compression than gamma correction is nec- 
essary to approach fidelity to visual perception of scenes with 
strong lighting variations. We did not match camera spatial 
resolution to observation very carefully, so some difference in 
perceived detail is expected and observed. However, overall 
color, lightness, and detail rendering for the MSRCR is a good 
approximation to human visual perception. 

The rest of the selected test images (Figs. 6-8) were ac- 
quired from a variety of sources (see acknowledgments) and 
provide as wide a range of visual phenomena as we felt 
could be presented within the framework of this paper. Little 
comment is necessary and we will leave the ultimate judgment 
to the reader. Some images with familiar colors and no strong 
lighting defects are included to show' that the MSRCR does 
not introduce significant visual distortions into images that are 
without lighting variations. The white stripes of the American 
flag in Fig. 6(a) show a shift toward blue-green in the MSRCR 
output. This is, perhaps, analogous to the simultaneous color 
contrast phenomena of human perception. Moore et al. [8] 
noted a similar effect in their implementation of a different 
form of the retinex. The Paul Klee painting in Fig. 7(b) is 
included as a test of the subtlety of tonal and color rendition. 
Some of the test images with strong shadows zones w'here one 
or two color channels are preferentially clipped do exhibit a 
color distortion. This is due to the rather limited dynamic range 
of the “front-end" imaging/digitization, and is not an artifact of 
the computation. Even for these cases, the MSRCR produces 
far more visual information and is more “true-to-life" than the 
unprocessed image. The set of space images are included to 
show the application of the MSRCR to both space operations 
imagery and remote sensing applications. 

A further test is worthwhile in assessing the impact of the 
CRF on color consistency. The CRF, as expected, dilutes color 
consistency, as shown in Fig. 9. However, the residual color 
dependency is fairly weak and the visual impression of color 
shift is minimal especially in comparison with the dramatic 
shifts present in the unprocessed images. 
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Fig. 5. Test scenes illustrating dynamic range compression, color, and tonal rendition, and automatic exposure correction. All processed images compare 
favorably with direct scene observation with the possible exception of leftmost image, which is even lighter and clearer for observation. This scene has the widest 
dynamic range and suggests that even stronger dynamic range compression may be needed for this case. Top row: Original. Bottom row: Multiscale retinex. 



Fig. 6. Photographic examples further illustrating graceful dynamic range compression together with tonal and color rendition. The rightmost image 
shows the processing scheme handling saturated colors quite well and not distorting an image that is quite good in its original form. Top row: Original. 
Bottom row: Multiscale retinex. 


VI. Discussion 

While we have not yet conducted an extensive performance 
comparison of the MSRCR to other image enhancement meth- 
ods, we have done some preliminary tests of the MSRCR rel- 
ative to the simpler image enhancement methods — histogram 
equalization, gamma correction, and gain/offset manipula- 


tion [15], and point logarithmic nonlinearity [16]. Overall, 
the performance of the retinex is consistently good, while 
performance for the others is quite variable. In particular, 
the retinex excels when there are major zones of both high 
and low light levels. The traditional methods that we have 
compared against are all point operations on the image. 




Fig. 7. Miscellaneous examples illustrating fairly dramatic dynamic range compression as well one for subtlety of color rendition (second from 
leftmost — painting by Paul Klee). Top row: Original. Bottom row: Multiscale retinex. 



Fig. 8. Selection of space images to show enhancement of space operations imager)' and remote sensing data. Top row: Original. Bottom row: 
Multiscale retinex. 


whereas unsharp masking [17] and homomorphic filtering 
[17], [18] are spatial operations more mathematically akin to 
center/surround operation of the retinex. Unsharp masking is 
a linear subtraction of a blurred version of the image from 
the original and is generally applied using slight amounts of 
blurring. For a given space constant for the surround, we would 
expect the retinex to be much more compressive. It is not 
clear that unsharp masking would have any color constancy 


property, since the subtraction process in the linear domain is 
essentially a highpass filtering operation and not a ratio that 
provides the color constancy of the retinex. 

Homomorphic filtering is perhaps the closest computation 
to the MSRCR and in one derivation [19] has been applied 
to color vision. Both its original form and the color form rely 
upon a highpass filtering operation that takes place after the 
dynamic range of the image is compressed with a point log- 
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skylight 



daylight tungsten 



Fig. 9. Toy scene revisited, A test of the dilution of color consistency by the color restoration. While color consistency was shown previously to be near 
perfect for the SSR and MSR, some sacrifice of this was necessary to achieve color rendition. While slight changes in color can be seen, color consistency is still 
quite strong relative to the spectrophotometric changes seen in the original images (top row). The blues and yellows are in the color restored multiscale retinex 
(bottom row) are the most affected by the computer simulated spectral lighting shifts, but the effect is visually weak and most colors are not visibly affected 


arithmic nonlinearity. An inverse exponentiation then restores 
the dynamic range to the original display space. The color 
vision version adds an an opponent-color/achromatic transfor- 
mation after the application of the logarithmic nonlinearity. We 
have found that the application of the logarithmic nonlinearity 
before spatial processing gives rise to emphatic “halo” artifacts 
and have also shown that it is quite different visually and math- 
ematically from the application of the log after the formation 
of the surround signal [10]. Because of the nonlinearities in 
both the MSRCR and homomorphic filtering, a straightforward 
mathematical comparison is not possible. We do, however, 
anticipate significant performance differences between the two 
in terms of dynamic range compression, rendition, and, for the 
color vision case, color consistency. Another major difference 
between the MSRCR and homomorphic filtering is in the 
application of the inverse function in homomorphic filtering. 
The analogous operation in the MSRCR is the application of 
the final gain/offset. Obviously, the two schemes use quite 
different techniques in going from the nonlinear logarithmic 
to the display domain. We conjecture that the application of 
the inverse log function in the retinex computation would undo 
some of the compression it achieves. 


One of the most basic issues for the use of this retinex is 
the trade-off between the advantages versus the introduction 
of context dependency on local color and lightness values. 
Our experience is that the gains in visual quality, which can 
be quite substantial, outweigh the relatively small context 
dependency. The context dependencies are perhaps of most 
concern in remote sensing applications. The strongest context 
dependencies occur for the dark regions that are low because 
of low scene reflectances — for example, large water areas in 
remote sensing data adjacent to bright land areas. The large 
zones of water are greatly enhanced and subtle patterns in 
them emerge. The retinex clearly distorts radiometric fidelity 
in favor of visual fidelity. The gains in visual information, we 
hope, have been demonstrated adequately in our results. Even 
for specific remote sensing experiments where radiometric 
fidelity is required, the retinex may be a necessary auxiliary 
tool for the visualization of overall patterns in low signal 
zones. Visual information in darker zones that may not be 
detected with linear representations which preserve radiometry 
will “pop out” with a clarity limited only by the dynamic range 
of the sensor front-end and any intervening digitization scheme 
employed prior to the retinex. This may be especially useful 
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in visualizing patterns in remote sensing images covering 
land and water. Water has a much lower reflectance than 
land especially for false-color images including a near-infrared 
channel. The ability of the MSRCR to visualize features within 
both land and water zones simultaneously should be useful in 
coastal zone remote sensing. 

The retinex computation can be applied ex post facto on 8-b 
color images and all of the results presented here represent this 
application. We have noticed only one problem with this — that 
the retinex can and will enhance artifacts introduced by lossy 
coding schemes, most notably lossy JPEG. Hence, the retinex 
is best applied prior to lossy image coding. One obvious 
advantage that the MSRCR provides for image compression is 
its ability to compress wider dynamic ranges to 8-bit or less 
per band color output, while preserving, and even enhancing, 
the details in the scene. The overall effect then is a significant 
reduction in the number of bits (especially in cases where 
the original color resolution is higher than 8-b/band) required 
to transmit the original without a substantial loss in spatial 
resolution or contrast quality. 

The greatest power and advantage of the retinex is as 
a front-end computation, especially if the camera is also 
capable of wider than 8-b dynamic range. We have seen from 
scene photometry that 10-12-b dynamic ranges are required 
to encompass everyday scenes. Obviously, the retinex is most 
powerful as a front-end computation if it can be implemented 
within a sensor or between the sensor and coding/archival 
storage. We have not tested this retinex on wide dynamic range 
images, since we do not yet have access to an appropriate 
camera, therefore for wider dynamic range images some 
modifications in the processing may be anticipated. This may 
involve adding more scales, especially smaller ones, to provide 
a greater but still graceful dynamic range compression. 

We have encountered many digital images in our testing that 
are underexposed. Apparently even with modem photographic 
autoexposure controls, exposure errors can and do occur. An 
additional benefit of the MSRCR is it capacity for exposure 
correction. Again, this is especially beneficial if it is performed 
as a front-end computation. 

We do have the sense from our extensive testing thus far 
that the MSRCR approaches the high degree of dynamic range 
compression of human vision but may not quite achieve a 
truly comparable level of compression. Our impressions of 
the test scene cases is that direct observation is still more 
vivid in terms of color and detail than the processed images. 
This could be due to limitations in display/print media, or it 
could be that the processing scheme should be further designed 
to produce an even more emphatic enhancement. Further 
experimentation comparing test scenes to processed images 
and an accounting for display/print transfer characteristics will 
be necessary to resolve this remaining question and refine the 
method if necessary in the direction of greater enhancement 
of detail and color intensity. The transfer characteristics of 
print/display media deserve further investigation since most 
CRT’s and print media have pronounced nonlinear properties. 
Most CRT’s have an inverse “gamma” response [17] and 
the specific printer that we have used (Kodak XLT7720 
thermal process) has a nonlinear response. For the printed 
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results shown, we used a modest gamma correction (7 = 
1.2). While this does not represent an accurate inverse that 
linearizes the printer transfer function, it does capture the 
the visual information with a reasonable good and consistent 
representation. Obviously no matter how general purpose the 
MSRCR is, highest quality results will still need to account 
for the specifics of print/display media especially since these 
are so often nonlinear. 

VII. Conclusions 

The MSR, comprised of three scales (small, intermediate, 
and large), was found to synthesize dynamic range compres- 
sion, color consistency, and tonal rendition, and to produce 
results that compare favorably with human visual perception, 
except for scenes that contain violations of the gray-world 
assumption. Even when the gray-world violations were not 
dramatic, some desaturation of color was found to occur. A 
color restoration scheme was defined that produced good color 
rendition even for severe gray-world violations, but at the 
expense of a slight sacrifice in color consistency. In retrospect, 
the form of the color restoration is a virtual spectral analog 
to the spatial processing of the retinex. This may reflect some 
underlying principle at work in the neural computations of 
consciousness; perhaps, even that the visual representation of 
lightness, color, and detail is a highly compressed mesh of 
contextual relationships, a world of relativity and relatedness 
that is more often associated with higher levels of visual 
processing such as form analysis and pattern recognition. 

While there is no firm theoretical or mathematical basis 
for proving the generality of this color restored MSR. we 
have tested it successfully on numerous diverse scenes and 
images, including some known to contain severe gray-world 
violations. No pathologies have yet been observed. Our tests 
were, however, confined to the conventional 8-b dynamic 
range images, and we expect that some refinements may be 
necessary when the wider dynamic range world of 10-12-b 
images is engaged. 
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! The fundamental goal of Visual Communication is to produce the best !' * I 

possible picture at the lowest data rate. This goal cannot be reached, as 
it has been pursued in the past, by treating image gathering, coding 
and restoration as separate and independent tasks. Instead, in a clear 
departure from the mores ot the traditional image processing literature, 
this monograph rigorously extends modern communication theory to 
the integration ot the two disciplines that are involved: flu* electro- 
optical design of image gathering and display devices and the digital ^ 

j processing for image coding and restoration. Extensive simulations j ji ! \ 

demonstrate that this approach establishes, for the first time, a close 
correlation between predicted and actual performance. 






